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CHAPTER  1 


Introduction 

Spectrum  Estimation  and  DigitaJ  Filtering  have  continued  to  be  two  of  the  most  important  research  areas 
among  the  signal  processing  community  over  the  last  three  decades.  As  part  of  the  work  under  this  proposal, 
several  reseaurch  problems  of  current  interest  have  been  addressed  and  solved  satisfactorily.  This  final  report 
contauns  the  details  of  adl  the  results  of  the  research  that  have  been  au:complisbed  over  the  entire  period.  Much 
of  the  results  contained  in  this  report  have  either  been  presented/published  or  are  under  review/preparation  for 
future  publication.  The  papers/publications  ensuing  from  this  research  are  listed  at  the  end  of  this  introductory 
Chapter.  Copies  of  the  papers  and  publications  will  be  included  with  the  Invention  Report. 

The  research  conducted  under  this  proposad  can  be  categorized  primarily  into  two  broad  themes,  viz., 

(i)  Digitad  Filtering  ;  The  following  problems  have  been  addressed  ; 

(a)  EfiScient  synthesis  of  2-D  Digital  Filters  using  1-D  modules 

(b)  Optimal  design  of  1-D  and  2-D  Digital  Filters  from  Impulse  Response 

(c)  Optimal  Identification  of  Discrete-time  Multivariable  Systems  from  Impulse  Response  Matrix 

(ii)  Spectrum  Estimation  : 

(a)  Development  of  efficient  algorithms  for  estimation  of  the  frequencies/aurival-angles  of  narrowband  and 
widebamd  sources 

(b)  Development  of  an  Order-recursive  adgorithm  for  AR-Bispectrum  Estimation 

(c)  Application  of  newly  emerging  non-linear  prediction  methods  for  speech  analysis  and  synthesis 

(d)  Utilization  of  Linear  Prediction  parameters  for  training  Time  Delay  Neural  Networks  for  speech  recog¬ 
nition 

The  report  is  organized  as  follows:  In  Chapter  2,  the  work  on  Digital  Filtering  is  reported  whereas  in  Chapter 
3,  the  Spectral  Estimation  area  is  covered  in  detail.  Individual  Chapters  are  divided  into  severzd  Sections  by  topics. 
In  the  following  paragraphs  the  main  results  obtained  in  these  Sections  are  outlined  very  briefly. 

CHAPTER  2,  DIGITAL  FILTERS  :  REALIZATION  AND  SYNTHESIS 
Section  -  2.1  :  Exact  Realization  of  2-D  UR  Filters  Using  1-D  Modules 

A  method  for  exact  realization  of  2-dimen8ional  digital  HR  filters  using  separable  l-dimensional  modules  is 
presented  [1].  The  proposed  design  utilizes  a  theorem  on  separability  of  multivariable  polynomials  for  writing 
a  2-D  polynomial  as  sum-of-product  of  1-D  polynomials  of  successively  diminishing  orders.  When  compared  to 
existing  methods  based  on  singular  value  decomposition  (SVD)  and  Jordan  form  decomposition  (JD),  the  pro¬ 
posed  approach  has  reduced  hardware  complexity  for  filter  implementation.  It  is  also  shown  that  the  method  has 
the  same  complexity  as  the  lower-upper  (LU)  matrix  decomposition  based  method.  But  unlike  the  decomposi¬ 
tion  based  methods,  the  filter  coefficients  are  found  directly  from  the  impulse  response  matrices  by  simple  and 
numerically  reliable  mathematical  operations. 

It  has  also  been  shown  that  utilizing  the  inherent  modularity  in  the  way  the  2-D  polynomials  has  been 
rewritten,  the  complete  2-D  transfer  function  can  be  built  with  a  number  of  first  and  second  order  1-D  filter 
blocks.  Hence,  recent  advances  in  VLSI  methodologies  can  be  utilized  to  facilitate  mass  production  of  2-D  filters. 
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Section  •  2.2  :  Optimal  Identification  of  1*D  Discrete- Time  Systems  from  Impulse  Response  Data 

Aq  optimal  aJgorithm  for  estimation  of  the  parameters  of  rational  transfer  functions  from  prescribed  impulse 
response  data  is  presented  [2].  One  of  the  major  contributions  of  this  part  of  the  work  is  that,  for  the  first  time, 
an  error  minimization  criterion  has  been  theoretically  derived  which  is  uniformly  applicable  to  rational  models 
with  arbitrary  numbers  of  poles  and  zeros.  This  is  a  very  important  result  because  existing  methods  either  modify 
the  true  non-linear  error  criterion  in  the  theoretical  derivation  or  require  the  transfer  function  model  to  have 
exactly  one  less  number  of  zeros  than  poles.  In  the  proposed  algorithm,  the  transfer  function  coefBcients  are 
estimated  by  minimizing  the  f2-norm  of  the  exact  model  fitting  error.  It  is  shown  that  the  complete  basis  space 
orthogonal  to  the  model  fitting  error  can  be  constructed  with  the  coefBcients  of  the  denominator  polynomial 
only.  The  multidimensional  non-linear  error  criterion  is  decoupled  into  a  purely  linear  and  a  purely  nonlinear 
subproblem.  Global  optimality  properties  of  the  decoupled  estimators  are  established.  The  inherent  mathematical 
structure  in  the  non-linear  subproblem  is  exploited  in  formulating  an  efficient  iterative  computational  algorithm 
for  its  minimization.  The  proposed  algorithm  provides  a  powerful  and  comprehensive,  theoretical  as  well  as 
computational  framework  for  modeling  general  pole-zero  (ARMA)  and  all-pole  (AR)  systems  from  prescribed 
impulse  response  data.  It  is  shown  that  the  algorithm  can  be  utilized  for  identifying  all-zero  (MA)  systems  also. 

Before  the  general  optimal  algorithm  mentioned  above  was  discovered,  we  had  developed  some  suboptimal 
designs  for  proper  transfer  functions  with  equal  number  of  poles  and  zeros.  This  work  is  reported  in  [3]  which 
will  be  included  with  the  Invention  report  but  these  suboptimal  results  are  not  described  in  Section  2.2. 

Section  -  2.3  :  Design  of  2-D  Recursive  Digital  Filters  iVom  Spatial  Domain  Data  -  Strictly  Proper 
Case 

A  class  of  least-squares  algorithms  for  design  of  two-dimensional  digital  filters  from  space  domain  data  is 
presented  [4,  15].  The  proposed  algorithms  iteratively  estimate  the  filter  coefficients  by  minimizing  the  true 
squared  error  between  the  given  and  the  estimated  space  domedn  responses.  The  algorithms  are  essentially 
generalization  of  an  existing  1-D  design  algorithm  given  by  Evans  and  Fischl  (EFM)  which  is  known  to  be 
optimal  when  the  number  of  zeros  in  the  transfer  function  is  one  less  than  the  number  of  poles.  Though  some 
work  extending  EFM  had  earlier  been  reported,  the  full  potential  of  EFM  was  never  meule  use  of  because  the  true 
reparameterized  error  criterion  was  not  derived  and  also  the  second  phase  of  EFM  was  never  evoked.  Also,  unlike 
the  earlier  methods,  the  error  criterion  is  simultaneously  optimized  w.r.t.  the  coefficients  in  both  dimensions. 
Design  algorithms  are  given  for  filters  with  separable  and  irreducible  numerator/denominator  polynomials  and 
also  for  mixed  structures. 

Section  -  2.4  :  Identification  Of  Discrete  Time  Multivariable  Systems  from  Impulse  Response  Data 

The  problem  of  identification  of  transfer  function  matrices  of  discrete  time  multivariable  systems  is  addressed 
[5].  The  proposed  technique  obtains  an  optimal  approximation  from  the  given  (possibly  noisy)  measured  impulse 
response  data.  It  is  assumed  that  the  measured  impulse  response  data  corresponds  to  a  system  with  a  strictly 
proper  transfer  function  matrix  with  common  denominators  and  different  numerator  polynomials.  Based  on 
the  proposed  theoretical  basis,  an  efficient  computational  algorithm  is  developed  and  illustrated  by  means  of 
several  examples.  In  [16],  we  propose  another  algorithm  that  obtains  a  common  numerator  as  well  as  a  common 
denominator  polynomial  for  all  the  elements  of  a  transfer  function  matrix.  This  design  e8senti^dly  produces  a 
common  controller  with  different  gains  for  several  plants. 

Realization  of  2-D  State-Space  Filters  With  Fewer  Multipliers 

This  work  was  published  [6]  during  the  proposal  period  though  the  major  part  of  the  work  was  completed 
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before  the  inception  of  this  research.  Hence  only  a  brief  summary  is  being  included  only  in  the  Introduction 
and  the  paper  will  also  be  included  with  the  Invention  report.  In  this  paper,  it  has  been  shown  that  under 
certain  controllability  and  observability  conditions  on  the  I-D  block  diagonal  subsystems,  a  reduction  in  the 
number  of  multipliers  for  hardware  realization  can  be  achieved.  Compared  to  a  related  existing  method  which 
requires  2nm  +  3(n  +  m)  +  1  multipliers,  the  proposed  transformation  reduces  the  multiplier  requirement  by 
(n  +  m),  where  n  and  m  denote  the  respective  dimensions  of  individual  I-D  blocks.  This  saving  in  cost  may  be 
substantial  if  the  filter  order  is  high.  A  systematic  procedure  for  obtaining  the  coefficients  of  the  minimal  number 
of  multipliers  is  also  given  in  the  paper  along  with  a  detailed  numerical  example  which  illustrate  the  accuracy  of 
the  proposed  method. 

CHAPTER  3  .  SPECTRUM  ESTIMATION  AND  RELATED  TOPICS 
Section  -  3.1  :  A  Cyclic  Algorithm  for  Maximum  Likelihood  Frequency  Estimation 

A  simple  cyclic  algorithm  for  estimation  of  multiple  frequencies  of  narrow-band  sources  from  noisy  data  is 
given  [7].  The  algorithm  iteratively  and  recursively  updates  each  unknown  frequency  by  minimizing  the  model 
fitting  error.  For  Gaussianly  distributed  noise,  the  algorithm  produces  maximum  likelihood  estimates,  otherwise 
least-squares  estimates  are  foimd.  At  each  step  of  the  algorithm,  the  optimization  problem  is  w.r.t.  a  single 
frequency  only  and  hence,  simple  hardware/software  (e.g.,  usage  of  FFT  for  the  computation  of  periodogram) 
will  be  sufiScient  for  implementation  of  the  proposed  cyclic  algorithm. 

Periodogram  is  one  of  the  most  commonly  used  spectrum  estimation  techniques.  But  it  is  well  known  that 
periodogram  can  not  resolve  closely  spaced  frequencies  or  angles  of  arrivals.  The  main  goal  of  this  research  was 
to  develop  an  algorithm  that  will  rely  on  periodogram  but  at  the  same  time  provide  high-resolution  estimates 
by  maximizing  the  maximum-likelihood  criterion.  The  proposed  cyclic  approach  achieves  these  goals  because  it 
requires  optimization  with  respect  to  only  one  frequency  at  every  estimation  cycle.  The  method  is  iterative  and 
recursive  and  relies  on  the  knowledge  of  approximate  prior  estimates  of  the  frequencies  (or  regions  of  interest) 
which  may  be  easily  obtained  from  the  periodogram  peaks.  The  estimates  obtained  using  the  algorithm  are 
unbiased  and  follow  the  Cramer-Rao  lower  bound  up  to  OdB  SNR. 

Section  -  3.2  :  A  Parsoneter  Adaptive  Simulated  Annealing  Algorithm  Applied  to  EVequency 
Estimation 

In  this  part  of  the  research,  a  faster  simulated  annealing  algorithm  is  proposed  [8]  and  applied  to  the 
frequency  estimation  problem.  The  proposed  annealing  scheme  is  based  on  a  cooling  schedule  which  is  parsuneter 
adaptive.  In  the  existing  annealing  schemes,  the  temperature  parameter  is  predetermined  for  every  iteration  step 
and  is  independent  of  the  unknown  parameter  values.  In  the  proposed  scheme,  the  cooling  temperature  is  made 
proportional  to  the  deviation  of  each  individual  parameter  at  the  earlier  iteration  step.  The  other  key  difference 
is  that  the  proposed  scheme  never  aiccepts  a  higher  energy  level  and  remains  at  the  present  lower  energy  position. 
Instead,  the  Boltzmaim  Distribution  is  used  to  accept  a  larger  cooling  temperature  which  is  same  as  performing 
parameter  search  with  a  broader  search  space.  This  algorithm  was  applied  to  the  non-linear  maximum-likelihood 
error  criterion  that  arise  in  frequency  estimation  and  simulations  confirm  that  the  proposed  scheme  converses  to 
the  minimum  energy  level  in  much  fewer  iteration  steps  when  compared  to  an  existing  fast  annealing  algorithm 
due  to  Szu. 

Section  •  3.3  :  One  Step  Estimation  of  Angles-of- Arrival  of  Wideband  Sources 

A  high  resolution  algorithm  for  estimating  the  angles  of  arrivals  of  multiple  wideband  sources  is  studied  for 
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this  part  of  the  work  [9].  The  algorithm  is  effective  for  a  dense  and  equally  spaced  array  structure  where  a  bilinear 
transformation  is  utilized  in  the  frequency  domain  for  combining  the  signal  subspaces  at  different  frequencies  for 
coherent  processing.  When  compared  with  existing  coherent  approaches,  the  algorithm  is  non-iterative  in  the 
sense  that  all  the  arrival  angles  can  be  estimated  in  only  one  step  of  the  algorithm.  Existing  algorithms  can  only 
estimate  the  angles  of  a  cluster  of  sources  in  a  particular  direction.  The  proposed  algorithm,  unlike  the  existing 
ones,  does  not  need  the  knowledge  of  the  initial  estimates  of  the  arrival  angles.  The  work  reported  here  is  a 
variation  of  some  earlier  work  by  the  author.  Instead  of  using  generalized  eigendecomposition  or  matrix-pencil 
method,  here  we  pre-  or  post-multiply  the  signal-subspace  matrix  with  the  noise  matrix.  This  enables  us  to  use 
regular  eigendecomposition  routine  to  estimate  the  source  angles.  It  is  also  shown  that  it  may  be  numerically 
more  stable  if  the  coherent  combination  is  not  focused  in  the  center  frequency  the  numerical  value  of  which  could 
be  very  large.  The  new  focusing  matrix  given  here  allows  to  focus  independent  of  the  center  frequency. 

The  original  intention  was  to  utilize  structured  matrix  approximation  appro2ich  to  this  problem.  This  task 
has  been  accomplished  but  the  eigendecomposition  based  method  given  here  performed  considerably  better. 

Section  -  3.4  :  An  Order-Recursive  approach  for  Parametric  Bispectrum  Estimation 

Order  recursive  computation  of  AR  parameters  from  cumulants  is  given  [10].  The  Cumulant  matrix  arising 
in  AR-Bispectrum  estimation  may  not  be  either  Toeplitz  or  symmetric.  In  such  cases,  it  is  shown  that  using  a 
block  matrix  inversion  formula  due  to  Frobenius  and  Schur,  the  inverse  of  the  p-dimensional  cumulant  matrix 
can  be  updated  from  the  (p  —  l)-dimensional  inverse  with  0(p®)  operations.  When  compared  to  commonly  used 
standard  batch-mode  computation,  the  proposed  algorithm  reduces  the  computational  requirement  for  order- 
recursive  calculation  of  the  AR-parameters.  When  the  cumulant  matrix  is  non-symmetric  Toeplitz  also,  further 
reduction  in  computation  is  obtained  using  an  algorithm  due  to  IVench. 

Section  -  3.5  :  Phoneme  and  Vowel  Recognition  Using  Timci-Delay  Neural  Network 

In  this  part  of  research,  Time-Delay  Neural  Network  (TDNN)  architecture  has  been  used  for  speaker  indepen¬ 
dent  recognition  of  Phonemes  and  Vowels  of  isolated  words  [11,  12].  One  of  the  limiting  factors  of  many  existing 
speech  recognition  algorithms  is  the  requirement  of  precise  temporal  alignment.  Segmentation  and  dynamic  time¬ 
warping  are  usually  performed  to  solve  this  aligiunent  problem.  Dynamic  time- warping  at  each  speech  segment  is 
computation  intensive.  Also,  segmentation  may  be  erroneous  in  itself  and,  when  in  error,  will  cause  recognition 
failure  due  to  a  mismatch.  Furthermore,  it  is  advantageous  of  recognition  algorithm  to  look  at  multiple  time 
frames  at  one  time  in  order  to  madre  use  of  the  inter-&ame  relationships  and  differences  in  the  input  features. 
TDNN  has  the  ability  to  represent  relationships  between  events  in  time  and  at  the  same  time  it  allows  for  invari¬ 
ance  of  these  events  under  translation  in  time.  With  this  translation  invariance,  a  Time  Delay  Neural  network 
does  not  require  precise  temporal  alignment,  therefore  the  network  is  able  to  simply  scan  the  input  features  for 
clues.  This  is  a  necessary  requirement  for  efficient  continuous  speech  recognition  addressed  in  this  work. 

Our  goal  was  to  improve  the  performance  of  TDNN  by  increasing  the  amount  of  data  supplied  to  the  network. 
This  was  achieved  by  including  the  LPC  coefficients  along  with  the  FFT  bin  energies.  We  have  2Jso  trained  the 
network  with  utterances  having  variable  durations.  We  feel  that  it  is  an  important  aspect  due  to  the  extreme 
variability  in  speaking  rate  of  different  speakers  at  different  situations.  Also,  restricting  the  utterance  length  to 
ISOms  (as  it  seems  to  have  been  proposed  by  Waibel  et  al)  in  order  to  make  a  decision  about  a  vowel  may  limit 
the  network’s  performance  because  depending  on  speaking  style  and  the  spoken  word  the  selected  150ms  may 
not  contain  all  the  key  information  to  recognize  the  vowel.  We  have  trained  the  network  with  multiple  English 
speakers  and  have  obtained  100%  recognition  rate. 
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Section  -  3.6  :  Speech  Analysis  and  Synthesis  with  Non-Linear  Prediction 

One  of  the  common  assumptions  in  speech  has  been  that  speech  production  and  perception  are  essentially 
linear  processes  and  hence  one  can  accurately  model  speech  data  using  ‘linear  prediction’  based  methods.  Phys¬ 
iological  evidence  indicate  that  some  nonlinear  operation  does  occur  in  speech  production  and  perception.  It 
is  also  known  that  linear  models  perform  poorly  for  certain  types  of  speech.  Based  on  these  observations,  for 
this  final  part  of  research  we  consider  the  applicability  of  cert2un  parametric  nonlinear  models  for  the  purpose 
of  analysis/prediction/synthesis/coding  of  speech  signals  [13,  14].  To  the  best  of  our  knowledge,  these  models 
have  not  yet  been  exploited  for  speech  modeling.  Several  algorithms  for  simultaneous  estimation  of  the  non¬ 
linear  as  well  as  the  linear  prediction  peirameters  of  speech  signals  have  been  studied  and  more  work  is  under 
way.  These  studies  indicate  that  the  nonlinear  models  retain  substantially  more  information  when  compared  to 
linear-only  models.  Experiments  on  telephone  quality  speech  data  clearly  and  consistently  indicate  that  there  is 
a  significant  reduction  in  the  prediction  error  when  the  bilinear  prediction  components  are  included  along  with 
the  LPC  part.  The  results  in  this  work  may  have  significant  effect  on  the  performance  accuracy  of  any  speech 
recognition /synthesis/ coding  system  that  currently  relies  on  linear  prediction  only.  . 
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CHAPTER  2 

DIGITAL  FILTERS  :  REALIZATION  AND  SYNTHESIS 
SECTION  2.1  ;  SEPABABLE  AND  EXACT  REALIZATION  OF  2-D  HR  FaTERS 
Summary 

A  method  for  exact  realization  of  2-dimensionaI  digital  HR  filters  using  sepeirable  1-dimensional  modules  is 
presented.  The  proposed  design  utilizes  a  theorem  on  separability  of  multivariable  polynomials  for  writing  a  2-D 
polynomial  as  sum-of-product  of  1-D  polynomials  of  successively  diminishing  orders.  The  proposed  realization 
has  lower  complexity  compared  to  most  existing  methods. 

I.  INTRODUCTION 

Exact  realization  of  2-D  HR  filters  using  separable  1-D  filter  is  one  of  the  most  elusive  problems  in  digital  signal 
processing.  In  development  of  stability  tests  and  for  stabilization  of  1-D  digit2Ll  filters,  factorization  plays  an 
important  role.  However,  it  is  well  recognized  that  no  fundamental  theorem  of  algebra  on  factorization  exists 
for  polynomials  in  two  independent  variables.  This  lack  of  a  corresponding  theorem  has  been  aptly  phrased  as 
“a  fundamental  curse”  of  2-D  filtering.  In  the  last  two  decades  considerable  research  effort  has  been  devoted  to 
the  development  of  algorithms  for  2-D  polynomial  factorization.  In  [1],  lYeitel  and  Sh2uiks  proposed  a  scheme 
for  approximate  implementation  of  2-D  planar  (2-D  FIR)  filters  with  my.  n  coefficients  in  terms  of  k  number  of 
separable  blocks  each  requiring  (m  -f-  n)  coefficients.  In  [2], [3]  a  separability  theorem  of  multivariable  polynomial 
is  presented  and  using  the  separability  result  for  the  2-D  case,  Suresh  and  Shenoi  [4]  presented  an  exact  realiza¬ 
tion  of  2-D  planar  filters  by  separable  1-D  filters.  In  [5],  Venetsanopoulos  and  Mertzios  used  a  general  matrix 
decomposition  theorem  for  exact  decomposition  of  a  general  2'D  real  rational  transfer  function.  It  was  shown 
that  in  the  decomposed  form,  each  rational  function  depends  on  only  one  of  the  two  independent  variables.  Nikias 
et  al  [6]  showed  that  LU  decomposition  can  be  used  for  exact  implementation  of  2-D  rational  transfer  functions. 
Moreover,  from  the  hardware  point  of  view,  the  LU  decomposition  is  considerably  more  efficient  compeired  to  the 
realizations  based  on  Singular  Value  Decomposition  (SVD)  and  Jordan  form  Decomposition  (JD)  [7].  For  more 
references  on  2-D  filter  implementation,  see  [5]  and  [6]. 

In  this  work,  we  utilize  the  separability  theorem  of  [2], [3]  to  develop  exact  realizations  of  2-D  digital  filters  with 
rational  transfer  function  using  separable  and  parallel  blocks  of  1-D  filters  of  successively  reducing  orders.  For  the 
planar  case,  the  proposed  approach  is  essentially  similar  to  the  one  given  in  [4],  except  that  our  implementation 
requires  reduced  number  of  delay  elements.  The  proposed  approach  for  the  planu  case  is  then  extended  for 
implementation  of  rational  trsmsfer  functions  (2-D  IIR  filters)  by  incorporating  feedback  and  cascsuling.  For  the 
full  rank  planar  coefficient  matrix  case,  the  given  approach  has  reduced  hardware  requirement  than  SVD  and  JD 
approaches  and  same  complexity  as  LU  decomposition  approach. 

In  the  proposed  scheme,  the  filter  coefficients  are  found  by  simple  operations  of  matrix  addition,  subtraction 
and  multiplication.  For  rank  deficient  coefficient  matrices  also,  the  proposed  scheme  has  reduced  hardware 
complexity  than  the  SVD  and  JD  approaches.  Similar  to  the  matrix  decomposition  based  methods  [5], [6],  the 
proposed  approach  also  possesses  high  degree  of  modularity  and  modern  VLSI  technology  can  be  utilized  for 
efficient  hudware  realization.  The  results  presented  in  the  sequel  are  a  more  formal  and  detruled  presentation  of 
[8]. 

This  Section  is  arranged  as  follows.  In  Subsection  II,  the  problem  is  formulated  and  the  planar  full  rank  case 
is  first  treated  by  considering  the  numerator  of  the  2-D  rational  transfer  function.  In  Subsection  III,  the  separable 
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IIR  design  is  considered.  The  rank  deficient  case  is  briefly  outlined  in  Subsection  FV.  Finally,  in  Subsection  V,  an 
examples  is  given  to  illustrate  the  proposed  technique. 

n.  Separabie  Realization  of  2-D  FIR  Piltee 
A  2-D  rational  transfer  function  has  the  following  general  form, 

Iff  \  _  9i2l,Z2)  _  I3"=o  ^  127=0  ^  ^  (2  1) 

-  p(zi,z2)  i:r=o'Er=v‘p(^j>r‘^2‘^' 

Let  us  consider  the  numerator  polynomial  q(zi ,  rj)  first.  Its  impulse  response  can  be  represented  by  the  following 
ni  X  mi  matrix  ^ 

rg(ni  -  l,mi  -  1)  9(ni-l,mi-2)  ...  g(Tii-l,0)' 

Q  A  (2-2) 

—  ^(l,mi  — 1)  9(1,  mi  — 2)  ...  9(1>0) 

9(0,  mi  — 1)  9(0imi— 2)  ...  9(0,0) 

such  that  the  polynomial  9(21 , 22)  can  be  written  as 


9(21,22)  A  zJQx2, 


where  zi  and  *2  are  defined  as, 


*1  A  ....r-i  1]’'  and  *2  A  [. 


(„,_1) 


and  denotes  the  transpose  operation.  We  first  consider  the  case  of  Q  being  a  full  rank  matrix. 

In  general,  the  coefficient  matrix  Q  is  not  separable,  i.c,  one  cannot,  in  general,  factor  9(21, 22)  in  terms  of 
1-D  2-domain  polynomials.  But  following  the  separability  testing  criterion  [2], [3],  the  matrix  Q  can  be  expressed 
as  the  sum  of  a  separable  matrix  Qo  and  an  error  matrix  Eo  which  again,  in  general,  is  not  separable,  i.e.,  [4] 

q{i,j)  =  9o(«,i)  +  co(«,;)  (2.5) 


where. 


9o(ni  -  1,  j)  =  9(”i  -  l.i).  0  <  jf  <  mi  -  1 
9o(»,mi  -  1)  =  9(1,  mi  -  1),  0  <  *  <  ni  -  1 
9o(»,i)  =  9(t,mi  -  l)9(ni  -  l,j), 

1  <  j  <  mi  —  1,  1  <  t  <  TJi  —  1. 


(2.6a) 

(2.66) 


The  indices  of  the  matrix  Qo  match  those  of  Q  as  in  (2.2).  Note  that  according  to  the  separability  criterion  [2], [3] 
any  (ni  x  mi)  2-D  planar  filter  G  is  separable  if 

g{i,0)g{Q,j)  =  9{i,j)  0  <  »  <  ni  -  1,  0  <  j  <  mi  -  1.  (2.7) 

Since  the  elements  9o(»,i)’8  follow  (2.7),  the  2-D  filter  having  impulse  response  matrix  Qo  is  separable  by  con¬ 
struction.-  Let  the  separable  form  of  the  transfer  function  be  expressed  as 

n,  — Imi  — 1 

qoizuza)  =  53  iZ  9o(t,i)2r‘*2^  =  cS(2l)rJ(22), 

•=0  >=0 


(2.8) 
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where, 

ni— 1  mi— 1 

^  ^  Co(‘)^r‘  an<l  rj(z2)  A  ^  ^J(i)z2^  (2.9) 

i=0  j=0 

The  coefficients  Cq  (t)  for  i  =  0, . . . ,  ni  —  1  and  rj  (j)  for  j  =  0, . . . ,  mj  —  1  are  constructed  as, 

Co(ni  -  1)  =  1,  and  Co(i)  =  — 1),  for  i  =  0, ...,ni-2  (2.10a) 

and 

»’o(»«i  -  1)  =  1.  and  rj(i)  =  g(f,  m  -  1),  for  j  =  0, . . . ,  mi  -  2,  (2.106) 

where  the  superscript  n  indicates  that  the  decomposition  is  for  the  denominator  polynomial.  Note  that  q(ni  — 
1,  mi  —  1)  is  assumed  to  be  non-zero  without  any  loss  of  generality.  If  it  happens  to  be  zero  then  the  first  column 
(row)  of  Q  may  be  interchanged  with  any  column  (row)  with  non-zero  leading  coefficient  and  at  the  same  time 
interchanging  the  corresponding  powers  of  zj  (^i)  in  the  vector  Z2  (*i)  •  Further,  the  matrix  Q  can  be  normalized 
such  that  the  (1,1)  element  is  equal  to  1.  In  (2.10)  above,  we  have  assumed  that  the  necessary  column  (row) 
permutation  and/or  normalization  has  already  been  performed. 

The  error  matrix  Eo  €  R"*’*”**,  formed  with  the  elements  of  eo(*,i)  in  (2.5)  has  the  following  form. 


■0  0  ...  0  - 

0  co(ni  -  2,mi  -  2)  ...  co(ni-2,0) 

.0  eo(0,mi-2)  ...  eo(0,0)  . 


(2.11) 


Now,  let  us  call  the  ni  —  1  x  mi  —  1  non-zero  submatrix  at  the  lower  right  hand  corner  of  Eo  as  Eo  and  let 
be  the  system  function  corresponding  to  it.  Then, 


»ii— 2mj— 3 

eo(^i.^2)  =  53  £  ^i''«reo(»,i).  (2.12) 

t=0  jsO 

Hence,  combining  (2.8)  and  (2.12),  the  numerator  can  now  be  written  as 

9(^1, 2^2)  =  cj(zi)rj(z2)  +  eo(zi,Z2)-  (2.13) 

Now  starting  from  the  matrix  Eo  which,  in  general,  is  non-separable,  we  can  proceed  similarly  as  we  did  for  the 
matrix  Q  and  form  another  summation  of  a  separable  and  a  possibly  non-separable  filter,  ».c.,  similar  to  (2.8) 
and  (2.13),  we  can  express  60(21,22)  as. 


eoizi,Z2)  =  qiizi.Zi)  +  ei{zi,Z2)  =  cy(2i)rj(22)  -I-  61(21,22),  (2.14) 

where,  6^(21)  and  r^izz)  are  defined  similarly  as  in  (2.9)  and  (2.10)  except  that  their  orders  are  reduced  by  one. 
Continuing  that  process,  eventually  we  will  get, 

mi  — 1  mi  — 1 

9(21,22)  =  53  «<(*1.^3)  =  2  c"(2i)rr(z2).  (2.15) 

<=0  1=0 

This  is  the  separable  form  we  were  seeking  for  the  numerator  polynomial.  Note  that  we  have  assumed  ni  >  mi  (= 
rank  of  Q),  without  any  loss  of  generality.  Also  note  that  c?(2i)  and  r,^'(22)  are  1-D  filters  with  orders  ni  -  1  -  1 
and  mi  —  1  —  i,  respectively,  ».e.,  the  filter  orders  are  successively  reducing  with  increasing  values  of  t.  The 
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separable  form  of  9(21, 23)  in  equation  (2.15)  implies  that  the  planar  2-D  filter  9(21, 23)  can  be  implemented  with 
parallel  blocks  of  separable  I'D  filters.  The  modularity  of  this  decomposition,  for  the  purpose  of  implementation 
is  obvious.  Note  also  that  in  SVD  :md  JD  approaches  [5],  each  parallel  block  would  usually  have  two  cascaded 
1-D  filters  of  orders  nj  —  1  and  mi  —  1,  respectively,  whereas  in  the  proposed  approach,  the  orders  of  the  1-D 
filters  in  successive  puallel  bremches  diminish,  thereby  reducing  the  complexity  and  hardware  requirement  in  the 
filter  implementation.  Specifically,  SVD  and  JD  approaches  would  require  mi  (mi  +  ni)  coefGcients  whereas  the 
present  approach  would  require  mi(ni  +  1)  coefficients  which  is  the  same  as  that  of  the  requirement  for  the  LU 
decomposition  case.  Hence  the  hardware  requirement  for  LU  decomposition  approach  [6]  is  exactly  same  as  in 
the  present  case. 

It  may  be  pointed  out  here  that  the  above  derivation  is  essentially  simil2u^  to  the  one  given  in  [4],  except  that 
they  considered  only  the  planar  case  and  started  with  a  permuted  form  of  the  Q  matrix.  Specifically,  in  [4],  the 
planar  matrix  is  written  as 


9(0,0) 

9(1.0) 


9(0.1) 

9(1.1) 


[9(711-1,0)  9(ni-l,l) 

such  that  the  polynomial  9(21 , 23)  may  be  expressed  as 


9(0.mi-l)  ■ 

9(1,  mi  -  1) 

9(ni  -  l,mi  -  1).. 


(2.16) 


9(21.  Z2)  A  zfQzs, 


(2.17) 


where  zi  and  £3  are  defined  as. 


zi  A  [1  rj"*  2f*  •••  and  £3  A  [1  zj*  ••• 

Eventually,  the  separable  form  equivalent  to  (2.15)  was  found  to  be. 


(2.18) 


9(^1, i^2)  =  9i(zi.  23)2^22*  =  ^r*22'^(^l)^(^2). 


(2.19) 


For  the  realization  in  (2.15),  the  highest  powers  of  delays  are  accounted  for  in  the  first  recursion  and  subsequent 
recursion  have  lower  powers  of  delay.  However,  the  realization  (2.19)  retains  the  highest  powers  of  delay  till  the 
last  recursion.  Therefore,  the  realization  in  (2.19)  would  require  2mi  extra  delays  when  compared  to  the  one  in 
(2.15).  This  is  evident  from  the  zf 'zj*  term  in  (2.19). 

The  separable  form  of  the  numerator  polynomial  is  now  complete  and  the  extension  to  the  denominator 
realization  is  given  next. 

III.  Separable  Implementation  of  2-D  IIR  Filter 
Let  us  first  write  the  denominator  polynomial  as 

p(2i,22)  =  K  +  /(zi.rz)  (3.1) 

where  the  polynomial  /(z;  ,23)  has  no  constant  term.  Next  the  all-pole  part  of  17(21,22)  in  (2.1)  is  rewritten  as, 


Hp{zi,Z2)  = 


p(2i,22)  K  +  7(21,22)  1  +  7(21,22) 


(3.2) 
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which  is  a  feedback  network  with  ^  in  the  forward  branch  and  /(zi ,  Z2)  polynomial  in  the  feedback  branch  as 
shown  in  Fig.  1.  Our  contention  here  is  that  /(zi.zz)  which,  in  general,  is  not  separable,  can  again  be  expressed 
as  a  sum  of  separable  polynomials  following  the  same  steps  used  in  obtaining  (2.15)  i.e., 

m>— 1 

nzuZ2)  =  52  (3.3) 

i=0 

where  d  indicates  that  the  decomposition  is  for  the  denominator  polynomial.  Next,  the  contributions  from  the 
numerator  and  the  denominator  can  be  cascaded  as  shown  in  Fig.  2  for  an  exact  realization  of  the  2-D  rational 
tramsfer  function  in  terms  of  only  separable  1-D  modules.  For  realizable  decomposition,  it  should  be  ensured  that 
there  must  not  be  any  delay-free  loops  in  the  feedback  path.  This,  in  turn,  can  be  ensured  by  selecting  cf{zi) 
and  rf{z2)  such  that  for  any  value  of  i,  both  cf(zi)  and  rf(z2)  do  not  have  a  constamt  term.  This  is  discussed 
later  in  Subsection  V. 

IV.  The  Rank  Deficient  case 

If  the  coefficient  matrix  Q  is  not  full  rank,  then  SVD  and  JD  approaches  would  require  ifc(mi  +  ni)  coefficients  for 
the  implementation,  where  k  is  the  rank  of  Q.  Whereas  the  LU  decomposition  would  require  k{ni  +Tni  —  k  +  l) 
coefficients.  However  the  proposed  direct  approach  would  still  require  mi(ni  -f  1)  coefficients  unless  at  some  stage 
£,-  is  a  zero  matrix.  Hence,  the  direct  implementation  of  the  proposed  method  may  not  always  be  economical  when 
compared  to  SVD  or  JD  approaches  and  it  will  almost  always  have  more  hardware  requirement  when  compared 
to  the  LU  approach.  Hence,  it  is  recommended  that  elementary  row  and  column  operations  be  performed  on  the 
Q  matrix  so  that  its  first  k  principal  minors  are  non-zero  [6].  If  Q  can  be  reduced  to  LU  decomposable  form 
after  the  row/colunm  operations,  then  we  can  still  apply  the  proposed  algorithm.  Note  that  in  this  case  Et-i  so 
obtained  will  be  a  zero  matrix.  Therefore,  the  number  of  coefficients  will  again  be  fc(ni  +  mi  —  ib  -H)  i.e.,  it  will 
have  lower  hardware  requirement  than  SVD  and  JD  approaches  and  the  same  hardware  complexity  as  the  LU 
approach.  It  should  be  pointed  out  that  the  main  advantage  of  the  proposed  algorithm  over  the  LU  approach  is 
that  it  produces  the  filter  of  same  complexity  as  the  LU  decomposition  without  resorting  to  LU  decomposition 
of  Q. 

V.  AN  Example 

In  this  Subsection,  we  will  illustrate  the  proposed  scheme  by  means  of  a  numerical  exrunple.  The  transfer  function 
ixsed  for  this  example  has  been  taken  form  [9].  H{z)  = 

?(^ii  ^j)  —  3zj  Z2  +  7Z|  ^2  ^  d"  2z2  ^  +  7zi  ^Zj  ^  +  9zj  ^Z2  *  +  Szj  *  +  3zj  ^  -i-  3zj  ^  -b  1 
/(•*i,  ^z)  =  3zi"^  -I-  zf’  +  4zf  *  -I-  fizf^zj  ^  +  2zf  *zj  *  -b  3zf  *  -b  3zf^z^’  -b  z^^z^^. 

Following  the  steps  outlined  in  the  previous  Subsection,  we  can  write 

r3  7  3] 

q{z:i,Z2)  =  zJ  7  9  3  Z2  a  Zi^Qz2 

[2  3  ij  ~ 


Q  =  Qo  +  Eo 

'17/31]  ro  0  O' 

=  3  7/3  49/9  7/3  +  0  -22/3  -4  . 

2/3  14/9  2/3  J  [0  -5/3  -1 
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,  we  can  write  the  numerator  polynomial  as 

—  (^1  +  7/3Zi  ^  +  2/3)  (322  ^  +  722  ^  +  3)  "I"  ®o(i^1*»22*) 

where  co(2i,  23)  =  — (22/32j‘*22  ^  +  42f  ^  +  5/325  ^  +  1.  Following  the  same  steps  on  Ej,  it  can  be  shown  that 
q(zi ,  22)  =  (zf*  +  7/32i-^  +  2/3)  (322-*  +  72,  *  +  3)  -  (zf*  +  5/22)  (22/32,  *  +  4)  -  1/11 

Next,  we  perform  a  similar  reduction  on  the  denominator.  The  polynomial  /(zi,  23)  can  be  written  in  the  matrix 
vector  form  as 

[12  1] 

f(zi,Z2)  =  zf  3  6  3  Z3  A  afPzs.  (4.1) 

_3  4  oj  ~ 

Note  that  due  to  realizability  constraints  [6], [10],  we  cannot  have  any  delay-free  loops  in  the  feedback  path. 
Accordingly,  we  must  make  some  minor  modifications  to  the  method  used  for  realization  of  the  numerator  poly¬ 
nomial.  In  particular,  it  is  necessary  to  permute  the  rows  (columns)  of  the  matrix  F  such  that  the  resulting 
realization  obtained  by  applying  the  proposed  method  does  not  contain  any  delay-free  loop.  It  should  be  pointed 
out  that  such  realizations  always  exist  provided  that  the  element  p(0,0)  in  (2.1)  is  non-zero  [6]. 

Applying  the  necessary  permutation  (interchanging  the  first  and  the  third  columns  of  the  matrix  F),  we  cam 
rewrite  (4.1)  as 

[1  2  1]  [  1  ■ 

/(2i,23)  =  [zr^  zf*  1]  3  6  3  z,-'  (4.2) 

.0  4  3j  [zj’. 

Next,  using  the  proposed  technique,  we  cm  obtain  a  realizable  form  for  /(zi ,  23)  given  by 

/(zi ,  22)  =  (3zf  ‘  +  zj-’)  (1  +  zf  ^)*  +  (422  *  +  32,  *) . 


Defining  Eo  =  [^5^3^  -t] 


Note  that  for  this  example  several  other  realizable  forms  do  exist.  However,  the  above  form  requires  the  least 
number  of  delay  elements. 

Next,  we  compare  the  hardware  requirement  for  the  proposed  realization  with  several  existing  ones. 


*  Note  that  the  method  in  [4]  is  applicable  to  FIR  filters  only. 
VI.  DISCUSSION 


The  modularity  in  ^(zi,  23)  can  be  observed  in  (2.15)  which  can  be  built  with  separate  1-D  modules  c"(2i)’s  and 
’*?(^j)’®  only-  same  will  be  true  for  the  feedback  path  to  implement  /(zi ,  23).  The  individual  modules  c"(2i)’s 
and  r{*(22)’8  can  again  be  factored  into  first  and  second  (to  have  real  coefficients  only)  order  polynomials  because 
they  are  functions  of  single  independent  variables  only.  This  implies  that  the  complete  2-D  transfer  function  can 
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be  built  with  a  number  of  first  and  second  order  1-D  blocks.  Recent  advances  in  VLSI  methodologies  can  be 
utilised  for  building  such  2-D  filters. 
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SECTION  2.2  :  OPTIMAl  IDENTIFICATION  OF  DISCRETE-TIME  SYSTEMS  FROM  IMPDISE 

Response  Data 

SUMMARY 

An  optimal  algorithm  for  estimation  of  the  parameters  of  rational  transfer  functions  from  prescribed  impulse 
response  data  is  presented.  A  major  contribution  of  this  work  is  that,  for  the  first  time,  an  error  minimization 
criterion  has  been  theoretically  derived  which  is  uniformly  applicable  to  rational  models  with  arbitrary  numbers 
of  poles  and  zeros.  Existing  methods  either  modify  the  true  non-line2ir  error  criterion  in  the  theoretical  derivation 
or  require  the  transfer  function  model  to  have  exactly  one  less  number  of  zeros  tbsin  poles.  In  the  proposed 
algorithm,  the  transfer  function  coeflScients  are  estimated  by  minimizing  the  /2-norm  of  the  eroct  model  fitting 
error.  It  is  shown  that  the  complete  basis  space  orthogonal  to  the  model  fitting  error  can  be  constructed  with 
the  coefficients  of  the  denominator  polynomial  only.  The  multidimensional  non-linear  error  criterion  is  decoupled 
into  a  purely  linear  and  a  purely  nonlinear  subproblem.  Global  optimality  properties  of  the  decoupled  estimators 
are  established.  The  inherent  mathematical  structure  in  the  non-linear  subproblem  is  exploited  in  formulating 
an  efficient  iterative  computational  algorithm  for  its  minimization.  The  proposed  algorithm  provides  a  powerful 
and  comprehensive,  theoretical  as  well  as  computational  framework  for  modeling  general  pole-zero  and  all-pole 
systems  from  prescribed  impulse  response  data.  It  is  shown  that  the  algorithm  can  be  utilized  for  identifying 
all-zero  systems  also.  The  effectiveness  of  the  algorithm  is  demonstrated  with  several  simulation  examples. 

1.  INTRODUCTION 

Identification  of  unknown  discrete-time  linear  systems  is  a  fundamental  problem  in  signal  processing.  Over 
the  last  several  decades  this  problem  has  remained  among  the  most  active  and  important  research  areas  [1- 
7,  10,  13-17,  24-34,  37,  55].  Depending  on  the  application,  design  specifications  or  the  information  available 
about  the  unknown  system,  linear  system  modeling  can  be  broadly  categorized  into  two  classes,  namely,  non- 
parametric  and  parametric.  Historically,  non-parametric  modeling  had  gained  early  populuity  due  mainly  to 
relatively  simpler  mathematical  and  computational  complexity.  Classical  non-parametric  models  include  impulse 
response  sequence,  Fourier-domain  representation  or  frequency  response.  Power  Spectr2il  Density  or  Periodogram, 
autocorrelation  sequence.  Characteristic  Function  and  others  [8-12,  57,  58].  The  major  drawback  of  classical  non- 
parametric  models  is  that,  more  often  than  not,  these  representations  are  theoretically  infinite  in  extent.  Hence, 
though  sound  and  simple  in  theory,  these  models  may  encounter  serious  data-bandling  problems  in  practice. 
Furthermore,  in  various  non-parametric  representations,  the  utilization  of  finite  amount  of  data  may  cause  limited 
resolution,  spurious  estimates,  oversboot/oscillation  (e.y.,  Gibb’s  Phenomenon),  broadening  of  main-lobe  width 
and  other  well-known  problems  associated  with  data  truncation  [7,  9,  11,  12,  57,  59]. 

Parametric  models  overcome  the  infinite  dimensionality  problem  of  non-parametric  models  by  representing 
the  system  in  terms  of  only  a  finite  number  of  parameters  or  coefficients.  Pole-zero  models  or  transfer  functions, 
linear  differential  and  difference  equations,  Wiener/Kalman  filters,  exponential  models,  Markov  models,  finite 
automata,  state-space  representations  etc.  are  some  of  the  well  known  parametric  system  models.  The  primary 
advantage  of  these  models  is  their  ability  to  describe  a  possibly  infinite  dimensional  system  accurately  and 
completely  in  terms  of  a  parsimonious  representation  that  is  dependent  only  on  a  small  number  of  parameters. 
Some  of  the  problems  encountered  in  parametric  modeling  include  the  proper  choice  of  the  type  of  the  model  that 
can  appropriately  describe  the  underlying  system,  accurate  determination  of  model  order  as  well  as  the  estimation 
of  the  parameters  defining  the  model.  Over  the  last  few  decades  these  problems  have  been  addressed  in  a  large 
body  of  work  [1-17,  24-50,  52,  53,  55,  57-59]. 

Among  the  parametric  models  mentioned  above,  the  pole-zero  or  rational  transfer  function  model  is  one  of 
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the  most  effective  and  practical  representations  in  digital  signal  processing  literature.  Over  the  years,  research 
on  parameter  identification  of  rational  transfer  functions  has  evolved  in  four  major  directions.  The  first  among 
the  following  four  approaches  deals  with  the  modeling  of  random  processes  whereas  the  later  three  address 
deterministic  problems.  Firstly,  in  statistical  and  modern  spectrum  analysis  literature,  unknown  systems  or  rather 
their  Power  Spectral  Densities  (PSD)  are  modeled  using  AR,  MA  and  ARMA  model  identification  methods  [8, 
9,  10,  11,  52].  The  system  parameters  in  these  cases  are  estimated  from  the  observed  output  data  record  only. 
Since  the  input  to  such  systems  can  not  be  observed,  white  noise  is  assumed  to  be  the  input  for  modeling 
convenience.  Secondly,  in  Control  Systems,  the  input  and  output  signals  of  a  plant  are  usually  measurable. 
Hence,  the  system  parameters  are  estimated  from  the  known  input/output  record  [1-3,  13-17,  32,  55].  Thirdly, 
in  Digital  Filter  design,  the  filter  specifications  may  be  given  in  the  frequency  domain  in  terms  of  the  magnitude 
and  phase  responses  in  the  pass/stop  bands.  Given  such  clear-cut  specifications,  standard  methods  are  available 
[7,  12,  59]  for  designing  HR  filters  with  classical  structures  such  as  Butterworth,  Chebyshev,  Elliptic  and  others. 
For  arbitrary  or  non-classical  frequency  domain  specifications,  least-squares  matching  leads  to  gener2Ll  non-linear 
optimization  algorithms  [25,  26,  31,  33].  Fourthly,  and  again  within  the  deterministic  design  context,  in  some 
applications  the  least-squares  fit  of  a  desired  time-domain  impulse  response  may  be  required  or  non-classical  filter 
specifications  with  arbitrarily  shaped  pass/stopband  response  may  be  desired  of  an  IIR  filter  or  the  unknown 
system  may  be  known  to  contain  unequal  numbers  of  poles  and  zeros.  In  such  cases,  it  is  more  advantageous 
and  appropriate  to  fit  the  impulse  response  data  directly  to  a  rational  transfer  function  model  by  estimating  the 
unknown  parameters  of  the  model. 

The  modeling  problem  addressed  in  this  Section  belongs  to  the  fourth  category  mentioned  above.  Specifically, 
given  a  desired  impulse  response  the  goal  here  is  to  estimate  the  coefficients  of  the  numerator  and  denominator 
polynomials  of  the  unknown  rational  transfer  function  by  minimizing  the  model  fitting  error  in  the  least-squares 
sense.  It  is  well-known  that  this  is  a  multidimensional  nonlinear  optimization  problem  [1-7,  10,  24-31,  34-37, 
41-43,  55].  There  have  been  a  substantial  amount  of  work  on  this  particular  problem  starting,  most  probably, 
with  the  work  of  Kalman  [1]  where  a  linearized  and  approximate  ‘equation  error’  was  minimized.  In  [3],  Steiglitz 
and  McBride  (SM)  proposed  a  linearized  ‘fitting  error’  minimization  criterion  whereas  Shanks  in  [2]  and  Burrus 
and  Parks  in  [30]  proposed  a  couple  of  two-step  procedures  where  the  denominator  polynomial  was  first  estimated 
by  minimizing  an  ‘equation  error’  and  then  that  denominator  was  utilized  to  obtrun  the  numerator  by  minimizing 
a  linearized  ‘fitting  error’.  These  methods  can  not  be  expected  to  produce  optimal  estimates  because  the  exact 
model  fitting  error  norm  is  not  minimized.  For  the  special  case  of  strictly  proper  transfer  function  models,  i.e., 
when  the  order  of  the  numerator  polynomial  is  exactly  one  less  than  the  denominator  polynomial  order,  an 
optimal  solution  that  minimizes  the  exact  fitting  error  criterion  has  been  introduced  by  Evans  and  Fischl  (EF)  [5, 
6].  Some  of  these  approaches  will  be  mathematically  explained  briefly  in  Subsection  II  so  that  their  relationships 
with  the  proposed  algorithm  may  be  better  appreciated. 

Among  other  important  works  on  this  subject,  a  quasi-linearization  method  similar  to  SM  was  given  in  [24] 
and  a  modification  of  SM  method  was  given  in  [63].  A  least-squares  Taylor  approach  was  proposed  in  [27]  and 
an  algorithm  rel}dng  on  general  non-linear  optimization  methods  such  as  Gauss-Newton  method  was  given  in 
[62].  In  [28]  a  two-step  modified  least-squares  criterion  was  optimized  using  Fade  synthesis  technique  and  the 
standard  Newton- Raphson  method.  In  [29],  the  first  two  terms  of  the  Taylor  series  expansion  of  the  non-linear 
criterion  was  minimized  and  in  [36]  gradient  based  algorithms  have  been  studied.  The  list  of  work  cited  here  is 
not  exhaustive,  only  some  of  the  more  importut  research  are  mentioned.  Other  related  references  may  be  found 
in  the  publications  cited.  A  thorough  treatment  on  ‘filter  design  by  modeling’  based  on  a  unified  framework  is 
presented  in  the  book  by  Jackson  [7,  see  also  10]. 

Most  of  the  approaches  cited  above  may  be  considered  sub-optimal  in  the  sense  that  theoretically  they  do  not 
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minimize  the  exact  model  fitting  error.  In  this  work  the  /3-norm  of  the  exact  fitting  error  between  the  desired  and 
the  estimated  impulse  responses  is  minimized.  The  optimization  is  performed  with  respect  to  the  denominator 
and  the  numerator  coefficients.  The  proposed  algorithm  is  closely  related  to  the  optimal  technique  proposed 
by  Evans  and  Fischl  [5-7].  It  is  well-known  that  the  optimal  EF  method  is  applicable  only  for  strictly  proper 
rational  transfer  functions.  In  certain  applications,  such  as  exponential  modeling,  the  strictly  proper  model  is 
indeed  the  appropriate  choice  and  a  complex  and  constrained  version  of  the  EF  method  has  been  very  effective 
[44-50,  60] .  But  the  EF  method  has  found  limited  applications  in  rational  transfer  function  modeling  problems 
because,  in  general,  the  highest  degree  of  the  numerator  polynomial  need  not  necessarily  be  exactly  one  less  than 
the  highest  denominator  degree  [3,  4,  25].  The  optimal  method  proposed  here  has  no  such  restrictions  and  may  be 
considered  a  generalization  of  the  EF  method.  Furthermore,  in  contrast  to  the  methods  presented  in  [1-4,  24,  30] 
no  linearization  or  modification  of  error  criterion  is  introduced  in  the  theoretical  derivation  of  the  least-squares 
model  fitting  criterion. 

One  of  the  critical  steps  in  the  theoretical  derivation  of  the  optimal  criterion  has  been  to  identify  the  complete 
basis  space  orthogonal  to  the  model  fitting  error.  It  will  be  shown  later  that  even  for  general  pole-zero  (ARMA) 
models  with  arbitrary  numerator  and  denominator  orders,  the  orthogonal  basis  space  can  be  completely  defined 
in  terms  of  the  denominator  coefficients  only.  The  fitting  error  is  then  shown  to  be  related  to  an  equation  error 
that  turns  out  to  be  somewhat  different  than  the  one  that  appears  in  the  EF  method.  But  the  form  of  the 
equation  error  is  mathematically  more  appropriate  for  the  general  case  considered  here.  The  final  error  criterion 
possesses  similar  mathematical  structure  as  in  EF  method  but  in  the  present  generalized  version,  the  dimensions 
of  both  the  ‘prefilter’  matrix  and  the  ‘data’  matrix  vary  according  to  the  numerator  and  denominator  orders. 
The  inherent  matrix  prefiltering  structure  of  the  error  criterion  directly  leads  to  formulating  an  efficient  iterative 
computational  algorithm  for  minimization. 

The  all-pole  (AR)  filter  design  problem  can  be  considered  a  special  case  of  fitting  general  pole-zero  (ARMA) 
transfer  functions  models.  Hence,  given  a  desired  impulse  response,  the  proposed  algorithm  is  also  shown  to 
produce  the  optimum  least-squares  estimates  of  the  parameters  of  an  ail-pole  (AR)  model.  Furthermore,  for 
ail-zero  (MA)  models,  the  well  known  Durbin’s  method  basically  relies  on  two  AR  model  identification  steps.  By 
utilizing  the  proposed  optimum  AR-algorithm  in  one  or  both  steps  of  Durbin’s  method,  a  modified  Durbin-type 
algorithm  is  also  presented  for  estimation  of  MA  model  parameters.  The  attractive  feature  of  this  work  is  that  the 
proposed  algor ‘  bm  provides  a  unified  and  general  framework  for  optima!  identification  of  discrete- time  systems 
from  impulse  response  data  encompassing  a  broad  class  of  HR  and  FIR  structures. 

This  Section  is  arranged  as  follows ;  in  Subsection  11,  the  problem  is  defined  and  some  related  works  are  briefly 
outlined.  In  Subsection  III  the  problem  is  formulated  for  the  pole-zero  (ARMA)  case  and  the  error  optimization 
criterion  is  derived  in  detail.  In  Subsection  IV,  the  criterion  is  appropriately  modified  to  solve  the  all-pole  (AR) 
case.  In  Subsection  V,  the  all-zero  (MA)  case  is  addressed  and  in  Subsection  VI,  several  simulation  examples  are 
given.  Finally,  in  Subsection  VII,  some  discussion  on  the  proposed  algorithm  is  given  and  the  Section  is  then 
concluded  with  some  directions  on  possible  future  research. 

n.  Problem  Statement  and  Related  Methods 


The  rational  transfer  function  model  of  a  general  recursive  IIR  digital  filter  can  be  represented  in  the  z-domain 


,  _  Op  +  -b - h  Oyz"*  .  N{z) 


where  the  coefficient  of  z^  term  in  denominator  has  been  assumed  to  be  unity  without  any  loss  of  generality. 
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Equivalently,  the  transfer  function  H{z)  can  be  also  written  in  terms  of  its  impulse  response  as, 

H{z)  =  h(0)  +  h(l)z"^  +  •  •  •  +  /i(JV  -  2)2-(^-2)  +  h{N  -  +  •  •  ■ .  (2) 

Stacking  the  hrst  N  ‘significant’  samples  of  H{x),  define, 

h  A  [A(0)  A(l)  •  •  •  h{N-  l)f .  (3) 

Next,  denote  vector  containing  the  N  samples  of  the  prescribed  or  desired  impulse  response  data  as, 

hjA[Ad(0)  h41)  ■■■  h4N-l)f.  (4) 

Paraphrasing  from  Steiglitz’s  paper  [4],  ‘in  the  best  of  worlds’,  given  a  desired  impulse  response  h,j,  ‘the  ideal 
problem’  of  optimal  estimation  of  the  parameters  a{  and  can  be  stated  as  : 

a  ft  ff/.-Ml*  _l _  /c.\ 


mm||e||*  A  ^  ]  ,  where 


_  ri,  t  =  0 
\  0,  i  ^  0, 


e  A  hj  —  h  (56) 

a  A  [uq  oi  •••  a,]’’  and  (5c) 

bA[l  6i  ...  bpf.  (5d) 

This  problem  is  known  to  be  nonlinear  in  b  and  standard  nonlinear  optimization  algorithms  have  been  suggested 
[62,  23,  25-29,  36].  Several  linearization  approaches  that  exploit  the  inherent  structure  in  this  problem  have  also 
evolved  [1-6,  24].  In  order  to  motivate  the  proposed  approach,  some  of  the  important  related  results  are  briefly 
outlined  next. 

Kalman's  Method 

In  one  of  the  earliest  work  on  this  subject,  the  solution  of  the  following  linear  problem  was  suggested  [1]  : 


min^  [D(2){Mi)}-^WW)}]  • 


The  advantage  of  this  modified  error  criterion  is  that  it  can  be  easily  minimized  in  the  least-squares  sense  w.r.t. 
the  unknown  coefficients  in  a  and  b  by  solving  a  set  of  simultaneous  linear  equations.  Apart  from  its  simplicity, 
this  approach  is  not  known  to  possess  any  optimality  property. 

Shanks’  Method 

This  is  a  two-step  approach  where  the  denominator  coefficients  are  first  estimated  by  minimizing  an  equation 
error  at  the  tail  end  of  the  impulse  response,  t.e., 
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The  minimization  of  this  optimization  critetion  is  also  known  as  the  ‘covariance  method’  of  linear  prediction  [7-11, 
52,  53].  Once  an  estimate  D{z)  of  the  denominator  polynomial  is  found,  the  numerator  coefficients  are  estimated 
by  minimizing  the  following  modified  fitting  error  norm  ; 


iiiin  5^  [hd(t)  -  {«(»•)}  ]  .  (7b) 

•=o 

Note  that  the  estimation  of  a  in  (7b)  is  again  a  linear  problem.  Burrus  and  Parks  [30]  had  presented  another 
method  that  is  closely  related  to  Shank’s  work.  The  denominator  was  found  in  exactly  the  same  manner  as  in 
(7a).  To  obtain  the  numerator,  the  first  q  error  seunples  were  forced  to  be  zeros,  i.e.,  the  first  q  samples  of  hj(n) 
were  used  as  the  best  available  estimates.  The  elements  of  a  are  then  found  from. 


k 

a*  =  Y>bih(i-k).  (7c) 

»=o 

The  noteworthy  feature  of  both  these  two-step  procedures  is  that  an  essentially  non-linear  problem  is  converted 
into  linear  problems,  but  the  methods  are  not  known  to  produce  optimal  estimates. 

Steiglitz-McBride’s  Iterative  Prefiltering  Method 

In  this  method,  an  initial  estimate  D(z)  of  the  denominator  coefficients  is  first  found  by  either  Kalman’s 
method  (6)  or  Shanks’  first  step  (7a).  Then  the  following  modified  fitting  error  criterion  is  optimized  iteratively. 


(8) 


The  estimate  D(z)  obtained  at  the  t-th  iteration  step  is  used  as  prefilters  for  obtaining  the  estimates  at  the  next 
iteration  step.  Note  that  (8)  closely  approximates  (5)  and  both  are  exactly  same  if  D(z)  =  D(z).  But  using  (8), 
the  unknown  parameters  in  a  and  b  can  be  estimated  by  solving  a  set  of  simultameous  linear  equations.  Further 
details  on  this  method  and  its  application  in  AR  and  ARMA  model-based  filter  design  may  be  found  in  [7]. 

It  should  be  mentioned  here  that,  very  recently,  McClellan  and  Lee  [34]  have  shown  that,  instead  of  optimizing 
the  original  SM  criterion  in  (8),  it  is  possible  to  split  the  optimization  problem  into  a  linear  and  a  non-linear 
problem.  But  more  interestingly,  for  the  strictly  proper  case  (p  =  9  -i-  1),  they  have  also  demonstrated  that 
if  a  is  estimated  in  a  particular  manner,  then  the  corresponding  non-linear  criterion  for  estimating  b  has  exact 
mathematical  equivalence  with  the  iterative  algorithm  of  the  optimal  £F  criterion  (outlined  next).  This  equivalence 
proof  appears  to  explain  why  the  SM  method,  which  was  originally  proposed  as  a  logical  extension  to  Kalman’s 
linearized  approach,  has  been  found  to  be  effective  in  many  applications  over  so  many  years.  It  should  also 
be  noted  here  that  another  decoupled  version  of  the  SM  method  may  be  found  in  [7]  where  a  is  estimated  by 
minimizing  the  fitting  error  norm  in  (7b).  It  will  be  shown  later  that,  for  a  given  b,  the  criterion  in  (7b)  produces 
the  optimal  least-squares  estimate  of  a. 

Evans-Fischl’s  Exact  Fitting  Error  Minimization  Metbod 

The  criteria  in  (6)-(8)  attempt  to  but  do  not  exactly  solve  the  ideal  problem  stated  in  (5).  But,  as  shown  in 
the  previous  paragraph,  the  SM  method  does  the  best  job  of  closely  approximating  the  fitting  error  and  it  has 
found  wide  applications  in  1-D  and  2-D  filtering  [4,  7,  10,  24,  39,  40,  55,  61,  63].  As  mentioned  above,  for  strictly 
proper  Hsp(z)  with  p  =  9  -I- 1,  an  optimal  approach  that  minimizes  the  exact  fitting  error  has  been  presented 
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in  [6,  6].  In  their  approach,  the  orthogonality  between  the  modeling  error  and  the  vector  space  spanned  by  the 
denominator  coefficients  was  used  to  show  that  the  following  criteria  are  exactly  equivalent  to  (5)  : 

min  l|esp(a,  b)||*  =  min  ||Bsp(Bf  pBsp)"^B|phd||^ 

a,b  b 

=  min||hjB5p(BjpB5p)“‘Bfphdl|^  (9a) 

b 

where, 


gjj^jvxjv-p,  (95) 


where  the  subscript  denotes  the  strictly  proper  case  considered  by  in  [5].  Note  that,  in  deriving  (9a),  no 
linesirization  or  approximation  had  been  introduced  at  the  outset.  The  criterion  in  (9a)  is  non-linear  and  an 
iterative  minimization  scheme  was  also  given  in  [5,  6].  The  initial  estimate  of  b  is  found  by  setting  the  ‘prefilter’ 
matrix  (BjpBjp)"^  =  This  again  results  in  the  so-called  ‘covariance  method’  of  (7a).  At  convergence 

of  the  GF  iterations,  the  optimum  b*  and  the  corresponding  minimized  error  e^p  are  obtained.  Using  the  optimal 
error,  the  ‘cleaned  up’  or  optimum  impulse  response  is  found  from. 


Bsp  A 


bp 

0 

0  - 

bp-1 

bp 

...  0 

1 

bi 

bp 

0 

1 

••  bp-i 

* 

bi 

.  0 

0 

1  . 

h*  =  hrf  —  ejp. 


(9c) 


The  first  p  —  1  terms  of  this  optimum  impulse  response  is  used  for  calculating  the  optimum  a^p  as. 


i=a 


for,  *  =  0,l,...,(p-  1). 


(9d) 


More  details  of  the  algorithm  may  be  found  in  [5,  6].  Also,  the  EF  case  can  be  considered  a  special  case  of  the 
general  algorithm  presented  here. 


The  EF  method  has  been  covered  in  detail  in  the  books  by  Jackson  [7]  and  Scharf  [10].  It  has  also  been 
extended  for  separable-denominator  2-D  filter  design  in  a  series  of  papers  [41-43].  A  modified  complex  version 
of  the  EF  method  (with  complex-conjugate  symmetric  constraints  imposed  on  b)  has  edso  been  developed  for 
maximum-likelihood  estimation  of  multiple  frequencies  or  angles-of-arrival  from  noisy  observation  data  [44,  45, 
47,  49,  60].  The  complex  version  of  the  EF  method  has  also  been  extended  to  2-D  for  simultaneous  frequency- 
wavenumber  estimation  from  array  data  [46-48,  50].  It  is  briefly  shown  next  that,  for  frequency/wavenumber 
estimation  problems,  the  original  EF  method  with  q  =  p—1  is  exactly  appropriate.  A  general  exponential  model 
can  be  defined  as, 

p 

*(n)  A  ^ate**",  for,  n  =  0, 1,2, . . . 

*=1 


where,  tk  =  iXk  +  In  this  model,  there  are  p  unknown  si’s  as  well  as  p  unknown  ai’s.  The  z-Transform  of 
z(n)  is 


=  t 


k=l 


akZ 


-23- 


Clearly,  after  summation  of  the  p  terms  in  the  right  hand  side,  the  numerator  order  will  indeed  be  one  less  than 
the  denominator  order. 

Though  the  EF  method  is  optimal  for  the  p  =  9  +  1  cases,  its  usefulness  in  rational  transfer  modeling  has 
been  very  limited  because  the  number  of  zeros  can  not  always  be  restricted  only  to  one  less  than  the  number 
of  poles  [see  4,  for  example].  It  does  not  solve  the  exact  fitting  error  optimization  stated  in  (5)  for  the  general 
ARMA  (?  ^  p  —  1)  and  the  AR  (g  =  0)  cases.  The  primary  objective  of  the  present  work  is  to  fill  this  void. 

III.  Problem  Formulation  and  algorithm  Development 

-  In  this  Subsection,  the  problem  is  formulated  for  the  ARMA  case  and  the  computational  algorithm  for  the 
fitting  error  minimization  is  derived.  The  development  and  the  execution  may  seem  similar  to  the  EF  method, 
but  it  will  be  demonstrated  that  the  proposed  algorithm  is  uniformly  applicable  in  a  broad  class  of  filter  design 
problems. 

m.l  :  GenersJ  ARMA  Case 


In  this  Subsection  the  general  ARMA  case  with  ?  <  p  —  1  is  considered.  Other  ARMA  cases  with  g  >  p 
will  be  addressed  later.  From  (1)  and  (2),  equating  the  coefficients  of  equal  powers  of  2,  the  transfer  function 
coefficients  can  be  related  to  the  impulse  response  samples  in  H{z)  as 

fal  THil 


where,  a  and  b  have  been  defined  in  (5c)  and  5(d),  respectively,  and 
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If  b  and  Hi  are  known  exactly,  a  can  be  found  simply  by  the  following  matrix-vector  multiplication, 

a  =  Hib.  (10c) 

However,  neither  the  exact  h  nor  the  matrices  Hi  and  Hj  are  known  in  practice  and  only  the  desired  impulse 
response  ha  is  available.  The  elements  of  Hi  and  Hj  in  (10)  can  be  replaced  by  the  corresponding  elements  in 
the  desired  response  ha  to  form  the  matrices  Hai  and  Haz,  respectively.  But  with  Hai  and  Ha2,  the  equality  in 
(10)  will  not  hold  and  the  lower  (AT  —  g  —  1)  equations  of  (10)  can  then  be  written  as  : 


Hazb  = 


'  M(g  + 1)  Aa(g) 
M(g  +  2)  Aa(g  +  1) 

A<f(p)  M(p  - 1) 

M^-l)  hi{N-2) 


hm  0 

hiil)  ha(0) 


ha(0) 

M^-P-1). 


=  d(b),  (11) 
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where  d(b)  is  an  "equation  error”.  It  may  be  pointed  out  again  that,  for  9  =  p  —  1,  the  minimization  of  ||d(b)||^ 
produces  the  so-called  “covariamce  method”  of  linear  prediction  [7-11,  52,  53].  It  may  also  be  recalled  that  the 
covariance  method  was  the  choice  in  [2]  and  [30]  for  linearly  estimating  b,  and  also  in  the  EF  method  [5,  6]  for 
obtaining  the  initial  estimate  of  b.  For  the  present  general  ARMA  case  with  9  <  p  —  1,  the  form  of  the  equation 
error  appearing  in  (11)  is  mathematically  more  appropriate  even  though  it  can  not  be  called  either  “coveiriance 
method”  or  “auto-correlation  method” .  As  briefly  outlined  next,  the  initial  estimate  of  b  will  be  computed  in  the 
proposed  algorithm,  by  minimizing  (|d(b)|p  w.r.i.  the  denominator  coefficients.  Equation  (11)  can  be  rewritten 

f  M?)  •••  MO)  0  0  I  rM?  +  i)l 

M<i  +  i)  •••  Ml)  MO)  0  M?  +  2) 
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.M^-1). 


+  d(h). 
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*4(9  +  1) 

=  h{p  -  1) 
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Miv-p-1). 


■M9  +  1) 
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the  minimization  of  ||d(b)(p  with  respect  to  b  =  [61  fcj  •  •  •  i„]^  results  in  the  following  initial  estimate  of  b. 


b^®)  = 


_G#g 


where  G*  A  (G^G)"^G^  denotes  the  pseudo-inverse  of  G.  Since  this  estimate  is  obtained  by  minimization  of 
an  equation  error  only,  it  does  not  necessarily  minimize  the  norm  of  the  true  model  fitting  error  ^Uld  that  remains 
the  primary  objective.  Next,  the  equation  error  d(b)  is  related  to  the  model  fitting  error  e. 

III.2  :  Fitting  Error  Minimization 

The  lower  partition  of  (10)  is  reproduced  below  : 

H2b  =  0,  (15a) 
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These  equations  can  also  be  expressed  in  a  rearranged  form  as, 


(16a) 


This  equation  establishes  a  key  relationship  between  the  equation  error  and  the  fitting  error.  But  in  order  to 
facilitate  the  minimization  of  the  fitting  error  norm  in  (5),  an  inverse  relationship  between  e  and  d(b)  is  developed 
next. 

For  a  given  b,  let  a"  denote  the  optimum  numerator  coefficient  vector  and  e(a®,b)  be  the  corresponding 
minimized  fitting  error.  Then  according  to  the  orthogonality  principle  [10  (page-325),  22],  this  fitting  error 
e(a*’,  b)  must  be  orthogonal  to  the  ‘estimate’  h(a',  b)  which  corresponds  to  the  optimum  a"  and  the  given  b.  For 
a  given  b,  if  this  orthogonality  does  not  hold,  the  non-zero  projection  of  the  error  e(a'’,  b)  onto  h  would  contain 
further  information  about  a.  This  implies  that  by  changing  a'  appropriately,  one  could  still  minimize  the  length 
of  the  error  e(a*’,  b).  This  contradicts  the  original  assumptions  about  a"  and  e(a*’,  b).  Now  the  questions  are  how 
to  construct  this  e(a',b)  which  will  be  orthogonal  to  the  estimate  of  h  and  how  that  error  may  be  lelated  to  the 
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imknomi  denominator  coefficients  in  b.  Fortunately,  the  answers  to  both  these  questions  can  be  found  in  equation 
(16)  which  clearly  shows  that  all  the  —  5  —  1)  linearly  independent  columns  of  B  are  indeed  orthogonal  to  h 
and,  furthermore,  B  is  constructed  using  denominator  coefficients  only.  Since  the  number  of  unknowns  in  a  is 
9  + 1,  it  can  be  concluded  that,  by  construction,  the  (N  —  q  —  1)  column  vectors  of  B  constitute  the  complete  basis 
space  of  e(a‘’,b)  that  is  orthogoned  to  h(a*’,b).  This  argument  implies  that  one  can  construct  the  orthogonal 
e(a‘’,b)  by  a  linear  combination  of  all  its  basis  vectors  in  the  columns  of  B,  namely. 


e(a*,b)  =  Be, 


(20) 


where,  c  A  [ci  C2  •  •  •  c^r-j-i]^  denotes  a  vector  of  constants  which  needs  to  be  determined.  In  order  to  find 
c,  it  may  be  observed  that  e(a°,b)  must  also  satisfy  (19).  Hence,  plugging  the  expression  of  e(a‘’,b)  from  (20) 
into  (19b), 

d(b)  =  (B’’B)c.  (21) 

But  (B^B)  is  a  square,  invertible,  banded  Toeplitz  matrix  and  hence,  c  has  a  unique  solution  : 

c  =  (B’’B)-M(b),  (22) 


and  using  (17c), 


c  =  (B’’B)-*B^hi. 


Plugging  this  imique  value  of  c  back  into  the  fitting  error  expression  of  (20), 


(23) 


e(a»,b)  =  B(B’'B)-^B^hd  (24a) 

A  Pabi,  (246) 

where,  Pb  denotes  the  projection  matrix  of  B.  The  fitting  error  e(a*,  b)  in  (24)  is  exactly  same  as  the  one  in  (5b), 
except  that  it  corresponds  to  the  optimum  a®.  Interestingly,  the  dependence  on  a  has  also  been  removed  in  the 
process  &om  the  expression  of  the  fitting  error.  The  problem  then  is  to  estimate  the  optimum  b®  by  minimizing 
the  /3-norm  of  e(a®,b)  w.r.i.  b.  For  an  optimum  a®,  the  minimization  problem  of  (5)  is  exactly  equivalent  to 
the  following  series  of  expressions  : 


min|le(a®,b)|l’  =  n^||B(B’'B)-^B’’hal|’, 

P  0 

(25a) 

=  minllPshall*, 

D 

(256) 

=  minbaB(B^B)“*B^ha, 

b 

(25c) 

=  minb^H3i(B’’B)-»Ha2b, 

b 

(25d) 

=  mind^(b)(B^B)"M(b). 

b 

(25c) 

Equation  (25e)  demonstrates  that  the  fitting  error  minimization  is  equivalent  to  a  “weighted  equation  error 
minimization®  where  (B^B)“^  acts  as  the  “matrix  prefilter”  or  “weight  matrix”.  At  minimum,  (25)  produces 
the  optimum  denominator  vector  b®  and  from  this  the  minimized  error  can  be  found  from. 

e(a®,b®)  =  PB.ha, 

(26) 

where,  B®  is  constructed  from  the  optimum  b®.  The  optimum  estimate  of  the  impulse  response  is  then. 


h®  =  ha  -  e(a®,b®). 


(27) 
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Using  the  optimum  values  of  h”  said  b"  in  the  top  partition  of  (10),  the  optimum  numerator  can  be  found  as 
follows, 

a®  =  H?b®.  (28) 

Equations  (25)  and  (28)  are  the  two  key  formulae  for  estimating  the  coefficients  of  the  denominator  and  numerator 
polynomials,  respectively.  Close  similarity  between  the  final  optimization  criterion  in  (25)  and  the  one  in  (9)  is 
obvious.  It  must  be  emphasized  though  that  (9)  deals  with  an  important  but  specific  case  of  strictly  proper  transfer 
function  with  q  =  p—1.  But  here,  the  derivation  of  (25)  was  based  on  appropriate  choice  of  the  orthogonal  basis 
vectors  in  B  for  any  ?  <  p  —  1.  Hence,  using  (25)  a  larger  class  of  general  pole-zero  transfer  function  model  fitting 
problems  ue  solvable. 

Intuitively,  it  appeeirs  that  the  decoupled  estimation  of  a  and  b,  as  found  above,  falls  within  a  special  class 
of  non-linear  optimization  problems  which  have  been  studied  extensively  by  numerical  analysts  [18-21].  It  has 
been  shown  in  [18]  that  in  a  non-linear  error  criterion,  if  some  of  the  unknown  variables  are  linearly  related  to 
the  error  and  the  other  variables  are  non-linearly  related  and  if  these  variables  appear  in  a  separable  form,  as 
happens  to  be  the  case  here,  then  the  two  decoupled  estimators  in  (25)  and  (28)  should  be  the  globally  optimum 
estimators  for  both  sets  of  variables.  The  derivations  leading  to  (25)  is  based  on  the  orthogonality  principle  and 
the  simple  solution  of  a  in  (28)  is  a  direct  consequence  of  (10c).  But  it  is  not  exactly  obvious  if  these  results  do 
possess  the  global  optimality  properties.  An  alternate  derivation  for  indirectly  arriving  at  (25)  and  (28)  is  given 
in  Appendix  A,  where  the  desired  optimality  properties  are  clearly  established.  A  computational  algorithm  for 
minimization  of  the  criterion  in  (25)  is  briefly  outlined  next. 

III.3  :  Computational  Algorithm 

The  criterion  in  (25)  is  non-linear  in  b  and  hence  it  can  not  be  minimized  directly.  But  instead  of  using 
standard  non-linear  optimization  techniques  the  inherent  mathematical  structure  of  the  criterion  will  be  utilized 
to  develop  soi  iterative  computational  algorithm.  The  final  form  of  the  error  vector  in  (24)  is  rewritten  as. 


e  =  B(B^B)-‘B^hd  (29a) 

A  WB^’hd  (296) 

=  WHd2b  (29c) 

=  W  Jg  :  cj  b  (29d) 

=  Wg  -f  WGb,  (29c) 

where, 

W  A  B(B’’B)-^  (29/) 


If  the  matrix  W  is  treated  as  independent  of  b,  an  expression  for  b  can  be  easily  obtained  by  minimizing  ||e|p 
w.r.t.  b  as  follows  ; 


b  =  -(WG)’*'Wg 

=  -  (G^W’’WG)~*G^W^Wg.  (30) 

But  since  W  does  depend  on  the  elements  in  b,  (30)  can  only  be  computed  iteratively.  At  the  (i  l)-th  step 
of  iteration,  W(')  is  formed  using  the  estimate  of  b  found  in  the  i-th  iteration  step.  This  leads  to  the  following 
iterative  algorithm  for  computing  b*+^  : 

r  1 

b('+i)  = 


-[xWG]->[X(‘)]g 


(31) 


where, 
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(31a) 

(316) 


=  g’’(b^^*^b(‘^)~^ 

The  iterations  are  continued  until  ||b,+i  —  b,||^  <  6,  where  j  is  an  arbitrarily  8m2tll  number.  It  must  be  noted 
here  that  the  iterations  in  (31)  may  not  always  converge  to  the  absolute  minimum  of  the  error  criterion  in  (5) 
and  hence  the  estimated  b  may  not  be  the  optimum  one.  This  is  because  in  (31)  the  variability  of  W  w.r.i.  b 
had  been  ignored  while  minimizing  ||e|p.  To  achieve  the  optimum,  the  gradient  of  the  complete  expression  of 
||e||^  must  be  set  to  zero.  If  desired,  this  can  be  done  in  a  second  phase  of  the  algorithm  which  is  outlined  in 
Appendix  B.  It  may  be  noted  here  that  the  simulation  studies  indicate  that  the  Phase-1  of  iterations  using  (31) 
does  an  excellent  job  of  bringing  the  estimate  very  close  to  the  optimum.  It  wil..  be  shown  in  Subsection  VI  that 
the  Pbase-2,  if  invoked,  only  causes  slight  changes  in  the  b  vector  and  the  minimized  error  norm.  In  simulations, 
the  convergence  was  found  to  be  quite  rapid  in  both  the  phases.  Once  the  estimates  of  b  converge,  a  is  computed 
by  following  the  steps  (26)-(28). 

II1.4  :  ARMA  Transfer  Functions  with  q  >  p 


In  this  case,  corresponding  to  the  lower  partition  of  (10),  the  equation  error  appearing  in  (11)  has  the 
following  form, 


'  hd{q  +  1) 

f^diq) 

hd{q  -  p) 

M?  +  2) 

hd(q  +  i)  ■ 

•  hd(q  -  p  +  1) 

H«b  = 

hi(p) 

hdiP-l)  ■ 

hd{0) 

.M^-1) 

hdiN-2)  . 

•  hdiN-p-1), 

(32) 


Note  that  the  initial  {q  —  p)  elements  of  h4(n)  do  not  play  any  role  in  this  equation  error.  But  again,  when 
q‘>p,  the  minimization  of  the  norm  of  this  equation  error  is  more  appropriate  than  the  use  of  either  coveuiance 
or  autocorrelation  methods  or  (11).  Similarly,  for  this  case  the  equations  equivalent  to  (17)-(19)  are. 


d(b)  A  Hdab 

0  •••  0  bp  bp^\  b\  1  •••0 

_0  •  •  ■  0  •••  0  bp  6p_x  6i  1_ 

A  B^hd 

=  B^e. 

Following  similar  arguments  leading  to  (25),  the  optimization  criterion  can  also  be  shown  to  be, 

min  ||e(a®,  b)||*  =  min  ||B(B’’B)"‘B^hrf|p 

A*  (D  D 

=  minlihTB(B^B)-‘B’’hd|p 

h 

=  minb’’Hj2(B’’B)-^Hrf2b. 


r  *^(o) 
/.„(!) 


(зза) 

(ззб) 

(33c) 

(33d) 

(34a) 

(346) 

(34c) 


Observe  that  the  matrix  B^  appearing  in  (33)  contains  (^  —  p)  leading  zero  columns.  This  has  the  net  effect  of 
zeroing  out  the  first  (q  —  p)  elements  of  h,i  and  e  in  equations  (33c)  and  (33d),  respectively.  Hence,  the  criterion 
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in  (34)  essentially  minimizes  the  norm  of  a  shorter  error  vector  e'  A  [e(?  —  p)  —  p  +  \)  ■■■  e(N  —  1)]^. 
The  iterative  algorithm  described  earlier  in  Subsection  111.3  can  again  be  utilized  for  estimating  b.  But,  if  the 
estimated  b  is  used  in  (26),  the  leading  (q  —  p)  elements  of  the  e  vector  will  turn  out  to  be  zeros.  Next,  if  (27)  is 
used  for  calculating  h®,  the  first  (g  —  p)  elements  of  the  estimated  impulse  response  will  have  exactly  same  values 
as  in  the  desired  response  h,f.  Fineilly,  while  estimating  a  using  (28),  the  leading  (g  —  p)  elements  of  must  be 
used  without  any  error  reduction. 

The  observations  in  the  previous  paragraph,  if  made  casually,  may  lead  one  to  conclude  that  the  estimates 
obtained  in  this  manner  may  not  be  optimal  because  the  leading  (g  —  p)  error  samples  are  not  minimized  at  all. 
But  in  Appendix  A  it  has  been  proved  that  (25)-(28)  do  indeed  produce  the  optimal  estimates  for  any  values  of 
p  and  g,  as  long  as(p+g  +  l)<N.  That  some  of  the  error  samples  can  not  be  minimized  is  only  due  to  the 
mathematical  nature  inherent  in  the  problem  itself  which  enforces  the  leading  (g  —  p)  elements  of  h<i  and  h  to  be 
equal  when  g  >  p.  This  should  not  be  viewed  as  a  limitation  of  the  proposed  algorithm  or  the  optimal  criterion. 
Furthermore,  once  the  denominator  is  calculated  using  (34),  one  may  consider  minimizing  the  modified  fitting 
error  criterion  in  (7b)  in  order  to  estimate  a®.  But  it  has  also  been  shown  in  Appendix  A  that  for  a  given  b,  (7b) 
and  (26)-(28)  produce  identical  estimates. 

IV.  AN  All-Pole  Filter  Design  algorithm 

The  H(z)  in  this  case  has  the  following  form, 

"  1  -I-  6iz-i  +  ■■■  +  6p.jz-0»-i)  +  bpZ-P  =  D(I) 
where,  g  =  0.  The  optimization  problem  of  (5)  can  be  restated  as, 

min  Hell*  A  min  g  [Ad(<)  -  {«(*))  ]  *•  (38) 

It  has  been  shown  in  Subsection  III  and  in  Appendix  A  that  the  equations  (25)-(28)  developed  in  Subsection  III 
is  equally  applicable  for  Vg  <  p  —  1.  Accordingly,  the  optimal  AR-algorithm  for  determining  oo  and  b  can  be 
treated  as  a  special  case  of  the  iterative  approach  derived  earlier  for  the  general  ARMA  case.  Hence  only  a  brief 
summary  will  be  given  by  defining  the  appropriate  matrices  involved  in  this  case.  The  denominator  coefficient 
vector  b  is  estimated  by  optimizing. 


min l|eAR(oo,  b)|l*  =  min  ||B^r(B5jiBar)  ‘B^RhiH® 

«0»D  O 

=  minb*’H^/*’’(B;flBAR)-'Hi/*b, 

O 

where. 


(39a) 

(396) 


(39c) 
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and 


A 


■  fc-(l) 

bs(0)  0 

•  •  • 

0 

0 

h4(q+  1) 

hi(?) 

Uo) 

0 

0 

M«  +  2) 

f^diq  +  1)  ■  •  • 

Ml) 

hm 

0 

h„(p-l) 

... 

... 

MO) 

MN-l) 

hi{N-2) 

... 

•••  hi{N-p-\)_ 

e  (39d) 


Minimization  of  ||e^ji||^  w.r.t.  b  is  carried  out  iteratively  and  at  the  (» +  l)-th  step  of  iteration,  b  is  found  from, 

1 


b(*+i)  = 


(40a) 


where. 


A  B^Jj) 


(406) 


GitJi  A 


■  hrf(O)  0  •  ••  0  •  •  •  0 

■  Ml)  ■ 

haiq)  •••  MO)  0  ••  6 

hdiq+1) 

M9  +  1)  •••  Ml)  MO)  •••  0 

hi{q  +  2) 

and  gAR  A 

: 

fi4(p-l) . .  •••  MO) 

fc<i(p) 

M{N-2) . hj(iV-p-l). 

.M^-1). 

(40c) 


As  in  (14),  the  iterations  are  started  using  the  initial  estimate  of  b  obtained  from, 

1 

b^®)  = 


(41) 


Once  the  iterations  in  (40a)  converge,  the  optimal  e  and  h  are  found  similarly  as  in  (26)  and  (27),  respectively. 
Since  ho  =  1,  the  optimal  numerator  coefficient  can  be  calculated  from  (28)  as,  oo  =  ^**(0).  Simulation  runs 
indicate  that  the  iterations  converge  rapidly  in  this  case  also.  Though  the  all-pole  case  is  considered  here  as  a 
special  case  for  the  general  pole-zero  case,  it  should  be  emphasized  that  the  all-pole  design  itself  U  an  important 
problem  in  many  applications.  To  the  best  of  the  knowledge  of  the  author,  the  optimal  solution  for  the  all-pole 
case  had  remained  unsolved. 

V.  AN  AlL-ZERO  FILTER  DESIGN  ALGORITHM 
An  all-zero  transfer  function  is  given  by, 

H(z)  =  Oo aiz”^ -1- H - -fa,z“*  A  N{z).  (42) 
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The  problem  then  is  to  estimate  a  for  a  given  where  >  9  + 1.  The  algorithms  presented  in  Subsection  III  and 
IV  are  suitable  for  ARMA  and  AR  filter  design  cases,  respectively,  and  can  not  utilized  directly  for  estimating 
MA  filter  coefficients.  But  the  most  effective  algorithm  known  for  MA  modeling  is  the  well-known  Durbin’s 
algorithm  [64]  which  relies  on  two  steps  of  AR  parameter  estimation.  Tiraditionally,  the  avtocomlation  mttkod 
of  linear  prediction  is  utilized  [7-11]  in  both  the  steps  of  Durbin’s  method,  because  it  produces  minimum-phase 
polynomials.  But  the  estimates  obtained  from  autocorrelation  method  may  not  be  optimal.  The  algorithm  given 
in  Subsection  IV  is  optimal  for  determining  AR  filter  coefficients  from  prescribed  impulse  response.  Hence,  it 
can  be  reasonably  hoped  that  the  introduction  of  the  proposed  AR  algorithm  in  one  or  both  stages  of  Durbin’s 
algorithm  may  produce  more  accurate  results.  The  proposed  modification  on  Durbin’s  algorithm  is  briefly  outlined 
below. 

Step  1  ;  Let  be  an  AR-model  of  hd(n)  with  ‘sufficiently’  large  order  L  such  that,  q  «  L  <  N,  and 

(43.) 

The  AR  coefficients  in,  A  [1  61  6/,]^,  can  be  estimated  using  the  optimal  algorithm  given  in  Sub¬ 

section  IV. 

Step  2  ;  If  £  is  chosen  large  enough  then  the  approximation  in  Step  1  will  be  very  close.  In  that  case, 


DUz) 


(436) 


Hence,  using  b^.  as  ‘data’,  the  MA  parameter  vector  a  can  again  be  estimated  using  the  optimal  algorithm 
presented  in  Subsection  IV. 

It  must  be  noted  here  that  the  usage  of  the  autocorrelation  method  at  both  the  steps  ensures  that  the  fln^ll  N{z)  is 
minimum  phase.  Instead,  if  the  proposed  algorithm  is  used,  the  minimum  phase  property  can  not  be  guaranteed. 
If  minimum  phase  property  is  indeed  desired  of  the  final  fV(z),  then  autocorrelation  method  can  be  used  (instead 
of  (41))  to  obt2un  the  initial  estimates  for  starting  the  iterative  AR-algorithm.  If  the  estimates  obtained  from 
the  iterative  algorithm  becomes  maximum  phase  at  any  iteration  step  of  the  AR-algorithm,  the  iterations  can 
be  terminated  at  that  stage.  The  estimate  found  at  the  preceding  iteration  may  be  considered  to  be  the  best 
possible  minimum  phase  estimate  that  the  iterative  AR-algorithm  can  produce.  The  drawback  of  this  scheme  is 
that  one  needs  to  root  a  polynomial  (with  possibly  large  order  in  Stejvl)  at  each  stage  of  iteration.  But  in  some 
applications,  this  extra  computational  burden  may  turn  out  to  be  an  acceptable  trade-off  in  order  to  gain  higher 
accuracy  in  the  resulting  estimates. 

VI.  Simulation  Results 

In  this  Subsection,  the  performance  of  the  proposed  algorithms  are  evaluated  by  means  of  several  ARMA(p,  q) 
and  AR(p)  model  identification  examples  with  various  p  and  q  values.  6  =  10~^  was  used  as  the  stopping  criterion 
in  both  phases  of  the  algorithms  for  all  the  four  examples  below. 

Simulaiion  1  :  The  desired  impulse  response  has  a  IViangular  form  as  shown  by  the  solid  lines  in  Fig.la  and  lb. 
The  algorithm  described  in  Subsection  III  was  used  with  p  =  7  and  g  =  4.  The  resulting  impulse  response  fit  at 
the  end  of  each  of  the  two  phases  are  shown  in  circles  in  Fig.  la  and  Fig.  lb.  The  minimized  error  norm  and 
the  closeness  of  the  fit  to  the  desired  signal  are  listed  in  Table  1 .  The  number  of  iterations  for  convergence 


-32- 


are  also  listed.  It  can  be  seen  that  there  is  no  significant  difference  between  the  1st  and  the  2nd  phase  of  the 
proposed  algorithm. 

Simulation  S :  An  arbitrary  impulse  response  was  generated  with  p  =  8  and  9  =  4.  If  the  algorithm  in  Subsection 
III  is  used  directly  to  match  the  true  response  it  will  give  perfect  answers.  Instead,  some  perturbations  or  ‘noise’ 
was  added  to  the  true  response  to  come  up  with  the  desired  response  For  30dB  ‘noise’,  the  true  and  the 
desired  responses  are  shown  in  Fig.  2a.  The  impulse  response  match  of  the  algorithm  at  the  end  of  Phase- 1 
and  Phase-2  are  shown  if  Fig.  2b  and  Fig.  2c,  respectively.  The  minimized  error  norms  and  the  closeness  to 
the  true  response  (  A  hj)  are  listed  in  the  first  two  columns  of  Table  2a,  respectively.  In  the  third  column,  the 
closeness  to  hj  is  also  given.  It  can  be  observed  that  the  closeness  figure  to  h|  may  decrease  from  Phase-1  to 
Phase-2  because  the  algorithm  is  not  attempting  to  match  that  directly.  In  Figures  2d,  2e  and  2f  and  Table  2b, 
the  corresponding  results  with  20dB  perturbation  are  given. 

Stmulaiion  S  :  In  this  case,  the  denominator  and  the  numerator  coefficients  of  Simulation  2  were  switched  to 
obtain  an  impulse  response  with  p  =  4  and  q  =  8.  For  30dB  ‘noise’,  the  true  and  the  desired  responses  are  shown 
in  Fig.  3a.  The  algorithm  in  Subsection  III. 4  was  used.  The  impulse  response  fit  at  the  end  of  Phase-I  and 
Phase-2  are  shown  if  Fig.  3b  and  Fig.  3c,  respectively.  Other  results  are  listed  in  Table  3. 

Simulation  4  ■  With  the  Triangular  desired  impulse  response  of  Simulation  1,  the  AR-algorithm  presented  in 
Subsection  IV  was  employed  with  p  =  5.  The  resulting  impulse  response  fit  at  the  end  of  each  of  the  two  phases 
are  shown  in  Fig.  4a  and  Fig.  4b,  respectively.  The  minimized  error  norm,  the  closeness  of  the  fit  to  the  desired 
signal  hi  and  the  number  of  iterations  for  convergence  are  listed  in  Table  4. 

It  can  be  fairly  concluded  from  these  simulations  that  the  Phase-1  of  the  algorithm  does  an  excellent  job  of 
error  minimization.  Hence,  the  Phase-2  of  the  algorithm  need  not  be  invoked  for  most  applications. 

VII.  Discussion  and  concluding  Remarks 

In  this  Subsection,  a  classical  rational  model  identification  problem  has  been  addressed  and,  for  the  most 
parts,  appears  to  have  been  solved.  The  major  focus  is  to  demonstrate  that,  given  a  desired  impulse  response 
corresponding  to  an  unknown  transfer  function  with  arbitrary  number  of  poles  2md  zeros,  it  is  possible  to  obtain 
the  estimates  of  the  parameters  of  the  transfer  function  that  are  optimal  in  the  least-squares  sense.  Unlike 
many  well-known  existing  results,  no  linearization  or  approximation  has  been  done  while  deriving  the  theoretical 
optimization  criterion.  The  proposed  technique  is  applicable  in  a  comprehensive  class  of  ARMA,  AR  and  MA 
model  identification  problems.  It  is  shown  that  the  multidimensional  non-linear  problem  can  be  decoupled  into 
two  smaller  problems  of  which  one  is  a  linear  problem  and  the  other  one  is  a  non-linear  problem.  The  inherent 
mathematical  structure  of  the  non-linear  part  is  utilized  to  formulate  an  efficient  iterative  computational  algorithm 
for  estimating  the  denominator  parameters.  The  numerator  is  then  found  by  a  simple  matrix- vector  multiplication. 
Global  optimality  properties  of  the  estimators  have  been  verified  by  relating  the  multi-dimensional  optimization 
problem  to  certain  well-known  results  in  numerical  anadysis.  In  simulation  studies  also,  the  method  has  been 
shown  to  be  highly  effective.  Some  important  aspects  related  to  the  algorithm  but  are  not  covered  above  are 
addressed  next  : 

Model  Order  Selection  : 

Model  order  selection  of  rational  transfer  function  models  from  impulse  response  data  remains  an  open 
problem.  This  aspect  of  the  problem  has  not  been  here.  It  appears  that  for  this  essentially  deterministic  problem, 
Akaike  Information  Criterion  (AIC)  or  Minimum  Description  Length  Criterion  (MDL)  may  not  be  applicable.  In 
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deriving  the  optimal  criterion,  the  model  orders  p  and  g  have  been  tacitly  assumed  to  be  known  or  the  estimates 
are  assumed  to  be  available.  But  in  defense  of  this  work,  it  may  be  noted  that  model  order  selection  was  not 
addressed  in  any  of  the  previous  results  on  this  problem.  In  simulation  studies,  it  was  observed  that  if  the  model 
orders  are  increased,  then,  in  some  cases,  the  estimated  response  gets  closer  to  the  desired  response.  But  over¬ 
determination  of  model  orders  may  lead  to  unstable  and/or  useless  solutions  because  the  matrices  may  tend  to  be 
singular.  Furthermore,  increasing  model  orders  also  raises  the  computational  load  on  the  resulting  filter.  Hence, 
there  is  a  trade-off  that  has  to  be  considered  in  deciding  the  proper  choice  of  the  model  order. 

Relationship  with  SM  method  and  Convergence  Analysis  of  the  iterative  algorithm  : 

As  already  noted  in  Subsection  II,  in  a  recent  paper  [34],  McClellan  and  Lee  have  shown  that  for  the  strictly 
proper  case  (p  =  9  +  1),  if  the  original  SM  method  is  decoupled  into  a  linear  and  a  non-linear  estimation  problem 
in  a  certain  manner,  the  resulting  non-linear  SM  criterion  has  exact  mathematical  equivalence  with  the  optimal  EF 
criterion  which  is  only  applicable  for  the  strictly  proper  transfer  functions.  The  SM  method  is  also  known  to  be 
very  effective  for  general  ARMA  and  AR  cases.  But  in  this  Subsection  the  optimal  criteria  for  both  ARMA  and 
AR  models  have  been  derived.  Hence  it  would  certainly  be  interesting  to  investigate  whether  the  equivalence  also 
holds  for  the  general  cases.  It  appears  that  by  using  the  new  definitions  of  the  matrices  Hi,  H2  and  B,  appearing 
in  (10),  (16),  (32),  (33)  and  (39),  McClellan-Lee’s  derivation  can  be  appropriately  modified  for  any  other  values  of 
p  and  q  also.  With  these  modifications,  it  can  be  easily  shown  that  the  original  general  SM  method  for  arbitrary 
p  and  q  can  also  be  decoupled  as  suggested  in  [34].  In  the  decoupled  form,  the  non-linear  criterion  of  the  SM 
method  can  also  be  shown  (proof  omitted  for  space  limitation)  to  be  mathematically  equivalent  to  the  iterative 
algorithm  given  in  (31)  for  minimizing  the  optimal  criterion  in  (25).  This  equivalence  proof  may  have  another 
important  consequence  for  the  proposed  algorithm.  There  already  exists  a  convergence  analysis  of  the  original 
SM  method  in  [55].  One  can  reasonably  hope  that  the  convergence  analysis  will  also  apply  to  the  decoupled  form 
of  SM  method  given  in  [34].  If  that  happens  to  be  the  case,  as  alluded  to  in  [34],  the  convergence  analysis  in  [55] 
should  also  apply  to  the  iterative  computational  algorithm  presented  here. 

Computational  Requirements  : 

The  major  computational  load  of  the  algorithm  is  in  performing  the  iterative  refinement  in  (29)  -  (31),  where, 
at  each  iteration  step,  one  needs  to  invert  an  (IV  —  9  —  1)  x  (/V  —  g  —  1)  matrix  (B^B).  It  may  appear  that 
this  inversion  should  require  C?[(iV  -  9  -  1)^]  operations.  But  (B^B)  is  a  symmetric-banded-Toeplitz  matrix  and 
many  efficient  algorithms  are  available  for  inverting  such  matrices  [9-11,  45,  56,  67].  Specifically,  in  [56]  it  has 
been  shown  that  inversion  of  such  banded  Toeplitz  matrices  only  requires  0[{N  —  9  —  l)log(7V  —  9  —  1)]  -I-  0[p^ 
operations.  Furthermore,  in  the  SM  method,  the  calculation  of  the  impulse  response  of  the  inverse  filter  and  data 
filtering  are  required  at  every  step  of  iteration,  whereas  the  proposed  method  uses  the  estimated  b  directly  to 
form  the  B  matrix. 

Future  Work  : 

Many  of  the  1-0  algorithms  discussed  in  the  Introduction  have  been  extended  to  2-D  for  estimating  2-D  filter 
coefficients  from  spatial  domain  data  [29, 35,  36, 39-43].  It  appears  that  by  identifying  the  appropriate  orthogonal 
subspaces,  it  should  be  possible  to  formulate  an  optimal  2-D  filter  design  algorithm  by  extending  the  optimal  1-D 
method  proposed  in  this  Section.  It  is  also  possible  to  extend  this  work  for  identification  of  Multidimensional 
systems  from  multidimensional  impulse  response  data  [38].  Work  on  these  topics  are  under  progress  [65,  66]  and 
the  preliminary  results  look  encouraging. 
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APPENDIX  A 

An  Alternative  Derivation  of  the  Error  Criterion 


In  this  Appendix,  an  alternate  derivation  is  given  for  the  error  criterion  in  (25).  This  derivation  reaffirms 
that  the  optimal  criterion  for  estimating  b  using  (25)  and  the  estimate  of  the  numerator  in  (28)  are  indeed 
exactly  equivalent  to  the  original  criterion  appearing  in  (5).  The  criterion  in  (5)  needs  to  be  optimized  w.r.t.  two 
sets  of  parameters  in  a  and  b,  where  a  is  linearly  related  to  the  error  whereas  b  has  a  non-linear  relationship. 
In  this  appendix,  it  is  shown  that  utilizing  filtering  interpretation  it  is  possible  to  split  this  multidimensional 
optimization  problem  into  a  linear  estimation  problem  for  a  and  a  non-linear  optimization  problem  for  b.  The 
filtering  interpretation  also  makes  it  possible  to  relate  this  problem  to  certain  non-linear  optimization  problems 
studied  by  numerical  analysts  [18-21]. 

Let  Hi,{z)  be  the  inverse  filter  corresponding  to  D{z),  i.e., 

Diz)m{z)  =  1.  (A.l) 


This  is  a  convolution  operation  where  the  discrete  sequence  6i’s  in  D{z)  are  finite  whereas  the  ht(n)’s  in  Hi,{z) 
are  infinite  in  extent.  In  matrix  notation  this  convolution  operation  may  be  expressed  as. 
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where,  denotes  an  JV  x  JV  identity  matrix.  Now,  rewriting  (1), 


H{z)  = 


A 


D{z' 

H,(z)N(z). 


(^•3) 


The  right  hand  side  in  (A.3)  again  represents  convolution  of  hi(n)  with  the  numerator  coefficient  sequence  a„. 
In  matrix-vector  notation,  the  vector  h  containing  the  impulse  response  values  in  (A.3)  can  be  represented  as. 


h  A  H'ja 

where,  Hj  contains  the  first  9  -f  1  columns  of  H4,  t.e.. 
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With  these  definitions,  the  problem  stated  in  (5)  can  be  rephrased  as, 

rainllelp  A  min|lhd  -  Hia|l’, 

a,b  =  a.b 


where,  using  (A. 4)  in  (5b),  the  error  vector  is  formed  as, 


(A.4) 
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e  A  hd  —  Hja.  (A.5a) 

This  equation  clearly  shows  the  linear  relationship  between  the  error  e  and  a  and  also  the  non-linear  relationship 
betweer^  a  and  b  through  the  matrix  H^.  In  this  form,  it  is  apparent  that  the  present  problem  belongs  to  a  larger 
class  of  mixed  optimization  problems  where  the  linear  and  nonlinear  variables  appear  separately.  This  class  of 
problems  have  been  studied  extensively  in  numerical  analysis  literature  [18-21].  The  main  objective  is  to  optimize 
the  two  sets  of  variables  independently.  If  (t.e.  b)  is  exactly  known,  then  the  minimization  of  (A.5)  will 
produce  the  linear  least-squares  estimate  of  a  as  follows, 


a  A  H;^hd. 


(A.6) 
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In  practice,  b  will  not  be  known  and  needs  to  be  estimated.  Plugging  a  back  in  (A.5),  the  optimization  criterion 
for  b  is  given  by, 


minllhd  -  Hia|l2  =  minllhd  - 

a,b  D 

{A.7a) 

=  njinllhd  -  PH;bd||^ 

{A.lb) 

=  minlKIjv  -  PH;)h<i||’. 

{A.7c) 

For  a  larger  class  of  problems,  it  has  been  proved  in  Theorem  2.1  of  [18]  that  if  b  is  estimated  by  minimizing  the 
criterion  in  (A. 7)  and  if  that  estimate  is  utilized  in  computing  a  using  (A. 6),  then  the  resulting  estimates  are  the 
unique  and  global  minimizers  of  the  criterion  in  (A.5). 

The  derivation  of  the  optimization  criterion  in  (A,6)-(A.7)  is  concise,  though  rigorous,  but  direct  optimiza¬ 
tion  of  (A. 7)  would  require  taking  resort  to  standard  non-linear  optimization  techniques.  This  is  because  the 
parameters  in  b  are  related  to  the  error  criterion  in  a  complicated  manner  through  Ph;-  Next,  the  criterion  in 
(A. 7)  is  reparameterized  .so  as  to  relate  it  directly  to  the  coefficients  in  b.  Appropriately  partitioning  the  matrices 
D  and  Ht  as  follows, 
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f  A*(0) 
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h»(0) 

0  1 

0 

0 

*»(«) 

hi,{q- 

1) 

*6(0)  1 

0 

0 

A  [HilHi'], 

hi,{q  +  1) 

*6(9) 

*6(1)  1 

*6(0) 

0 

hi{N  -  1) 

hi{N- 

2) 

*6(1) 

*6(0) 

(A.8a) 


{AM) 


the  equation  (i4.2)  can  be  written  as. 


or. 


=  In 


■  BjHi 

BjHi'  1 

1  0(,+i)x(yv-,-i) 

1  — 

.  b’’h; 

_ 

B^H" 

0(Ar-,-i)x(«-n) 

I(N-q-l)xiN-i-l), 

{AM) 


{A.8d) 


It  should  be  noted  that  in  the  partitioning  example  above,  only  the  9  <  p  -  1  case  is  shown,  but  the  results  are 
equally  applicable  for  any  values  of  p  and  q.  The  bottom-left  corner  element  shows  that  the  N  x.  {N  —  q  —  1) 
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matrix  B  and  the  TV  x  (5  + 1)  matrix  are  orthogonal,  i.e.,  B^Hi  =  0(s-f-i)x(i+i)  “d  since  by  construction 
both  are  full-rank  matrices,  it  is  also  true  that, 

ran/b(B)  -f  ran/b(Hi)  =  (-4.9) 

Hence,  using  a  theorem  on  projection  matrices  [54], 


Pb  +  Ph;  =  In-  (j4.10) 

Using  this  result  in  (A. 7c),  the  following  reparameterized  form  of  the  optimization  criterion  is  obtained, 

njin  llPshdll^.  (A.ll) 

t> 

Interestingly,  this  criterion  is  identical  to  the  one  in  (25b)  which  was  derived  by  relating  the  fitting  error  to 
the  equation  error.  It  should  be  mentioned  here  that  for  the  strictly  proper  case  with  p  =  9  1,  an  analogous 

derivation  appeeirs  in  [7].  Equation  (A.ll)  clearly  shows  that  if  the  criterion  in  (25)  is  optimized  to  estimate  b 
and  (A. 6)  is  used  to  estimate  a,  then  these  estimates  are  the  optimal  solutions  for  the  problem  stated  in  (5).  But 
it  may  be  recalled  that  the  optimal  numerator  coefficients  were  found  in  Subsection  III  using  equation  (28),  i.e., 

a»  =  HJb“,  (A.12) 

where,  the  superscript  ®  denotes  optimized  values  obtained  from  (25)-(27).  Hence,  all  that  is  left  is  to  show  that, 
once  b®  is  estimated  from  (25),  the  equations  (28)  and  (A.6)  produce  identical  estimates.  This  is  proved  next. 

Similar  to  (17),  the  equation  (A. 12)  can  also  be  rewritten  as, 


Brh", 

(A.13c) 

^^‘’(hd  -  e®),  using  (27), 

(A.136) 

Br(hd  -  hi  -t- 

using  (A.7a), 

(A.13c) 

(BrH‘’;)Hr’^hd. 

(A.  13d) 

H®'/hd, 

(A.13e) 

where,  the  last  equality  uses  the  fact  that,  =  I(,+i),  which  appears  in  the  upper-left  partition  of  (A.8c). 

This  completes  the  proof  of  equivalence  of  both  derivations.  The  derivation  given  in  this  Appendix  reveals  the 
globally  optimal  properties  of  the  estimates  obtained  Subsection  III.  It  may  be  observed  that  (28)  may  be  preferred 
over  (A.6)  in  computing  a  because  the  computation  of  ht  and  the  pseudo-inverse  solution  can  be  avoided,  whereas, 
computation  of  h®  may  be  a  necessary  step. 

It  should  be  emphasized  here  that  the  derivation  in  this  Appendix  did  not  make  any  assumption  about  the 
relation  between  q  and  p.  Hence,  the  optimal  algorithms  derived  here  and  the  equivalent  results  in  Subsection  III 
are  equ2tlly  applicable  for  any  AR  and  ARMA  cases  with  arbitrary  model  orders  p  and  9.  The  only  restriction  is 
that  the  number  of  unknowns,  (p  4-  9  -h  1)  be  less  than  or  equal  to  the  number  of  observations,  TV.  This  can  be 
seen  from  (10),  where  the  (9  -f  1)  equations  in  the  upper  partition  are  utilized  in  estimating  the  (9  -f  l)-length 
a-vector.  In  the  bottom  partition,  there  are  (TV  —  9  —  1)  equations  to  solve  for  the  p  unknowns  in  b.  Uniqueness 
of  the  estimated  b  requires  (TV  —  9  —  1)  >  p.  This  appears  to  be  a  powerful  result  because  it  substantiates  that 
the  applicability  of  proposed  optimal  algorithm  encompasses  a  wide  range  of  filter  design  problems. 
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APPENDIX  B 


Computational  Algorithm  :  Phase  II 


In  this  appendix  the  second  phase  of  the  iterative  algorithm  is  described  in  detail.  In  this  phase,  the  derivative 
of  the  matrix  W  w.r.i.  b  is  taken  into  consideration  while  minimizing  the  fitting  error  norm.  The  complete  error 
expression  is  rewritten  below, 

||e(a^b)||2  =  e’’(a^b)e(a^b).  (B.l) 

By  setting  the  derivative  of  this  squared  norm  to  zero,  we  obtain  the  updated  b^*"*"*)  at  the  (i+  l)-th  iteration  as. 


where  (suppressing  the  superscript  (*)), 


L  A 


U  A  L^W  +  G^W^W, 

aw 


dbi 


d(b) 


aw  ^ 

dbt  =  dbt  ’ 


(B.2) 

(B.2a) 

(B.26) 

(B.2c) 


dbk  =  dbt 


dbt^ - " 


and 


(B.2d) 


^  has  the  same  form  as  the  B  matrix  defined  in  (16a)  but  filled  with  all  zeros  except  at  the  locations  where  bt 
appear.  For  example. 


aB 

dbr, 


Once  b^*"*"*)  is  found,  can  be  formed  as, 

b('+‘)  = 


■0 

...  1 
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•  •  0- 
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...  0 
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••  0 
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: 

...  0 

0 

-.  1 
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...  0 

0 

••  0 
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•.  0 

.0 

...  0 

0 

••  0. 

€1R 


NxN-q 


(B.2e) 


1 


-[UWG]-»[U(0]gJ 


(fl.3a) 

(B.3i) 


This  minimization  phase  continues  until  b*'*'*  ~  b*  is  reached  and  this  optimum  b*  vector  corresponds  to  a 
minimum  of  the  error  surface  of  ||e(a*,  b)!!*. 


Closeness 
with  ht)  in  dB 

Number  of 
Iterations 

30.5173 

6 

30.6740 

3 

Table.  1  :  Results  of  Simulation  1.  A  triangular  Impulse  response  is  mod¬ 
eled  by  an  ARMA  model  with  p  =  7  and  9  =  4.  f  =  10~^  was 
used  as  the  stopping  criterion. 


u.s  ts.t 

of 


Fig.  la  :  Simulation  1  A  triangular  Impulse  response  is  modeled  by  an 
ARMA(7,4)  model.  £  =  10~^.  Result  shows  fit  after  Phase- 1 
convergence. 
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Fig.  lb  .  Simulation  1  A  triangular  Impulse  response  is  modeled  by  an 
ARMA(7,4)  model.  B  =  10"®.  Result  shows  fit  after  Phase-2 
convergence. 


Iteration 

Phases 

Minimized 

error  norm 

Closeness 
with  ht,  in  dB 

Closeness 
with  hi,  in  dB 

Number  of 
Iterations 

Phase  1 

0.010177 

25.8552 

22.6084 

1 

Phase  2 

0.010079 

25.30777 

22.05049 

2 

Table  2a  ;  Results  of  Simulation  2.  30dB  perturbation  was  added  to  a  true 
ARMA(8,4)  impulse  response  (h*)  to  form  the  desired  response 
(hi).  6  =  10~®  was  used  as  stopping  criterion. 


Fig.  2a  :  Simulation  2  :•  lYue  and  30dB  perturbed  A11MA(8,4)  impulse 
responses 


Fig.  2b  :  Simulation  2  :•  ARMA(8,4)  IVue  and  estimated  impulse  re¬ 
sponses  after  Phase  1  convergence.  SNR=30dB,  6  =  10~^. 


Fig.  2c  :  Simulation  2  :•  ARMA(8,4)  IVue  and  estimated  impulse  re¬ 
sponses  after  Phase  2  convergence.  SNRsSOdB,  S  =  10“®. 


Iteration 

Phases 

Minimized 

error  norm 

Closeness 
with  ht,  in  dB 

Closeness 
with  hrf,  in  dB 

Number  of 
Iterations 

Phase  1 

■ 

16.91 

12.00 

1 

Phase  2 

■ 

15.22 

12.178 

2 

Table  2b  :  Results  of  Simulation  2.  20dB  perturbation  was  added  to  a  true 
ARMA(8,4)  impulse  response  (ht)  to  form  the  deured  response 
(hrf).  S  s  10~’  was  used  as  stopping  criterion. 


Fig.  2d  :  Simulation  2  IVue  and  20dB  perturbed  ARMA(8,4)  impulse 
responses 


Fig.  2f :  Simulation  2  ARMA(8t4)  lYue  and  estimated  impulse  re¬ 
sponses  after  Phase  2  convergence.  SNRs;20dB,  6  =  10”*. 


Iteration 

Phases 

Minimized 

error  norm 

Closeness 
with  ht,  in  dB 

Closeness 
with  h,^,  in  dB 

Number  of 

Iterations 

Phase  1 

■ 

24.837 

22.8754 

5 

Phase  2 

■ 

23.869 

23.054 

3 

Table  3  :  Results  of  Simulation  3.  30dB  perturbation  was  added  to  a  true 
ARMA(4,8)  impulse  response  (ht)  to  form  the  desired  response 
(hd).  d  =:  10~*  was  used  as  stopping  criterion. 


Fig.  3a  ;  Simulation  3  Ihie  and  30dB  perturbed  AIIMA(4,8)  impulse 
responses 


Fig.  3b  ;  Simulation  3  ARMA(4,8}  T^e  and  estimated  impulse 

spouses  after  Phase  1  convergence.  SNR=30dB,  6  s=  10“®. 


Fig.  3c  ;  Simulation  3  :•  ARMA(4,8)  IVue  and  estimated  impulse  re¬ 
sponses  after  Phase  2  convergence.  SNRs30dB,  S  =  10“*. 


Iteration 

Phases 

Minimized 

error  norm 

Closeness 
with  ht,  in  dB 

Number  of 

Iterations 

Phase  1 

3.0368 

23.4366 

4 

Phase  2 

3.0348 

23.4395 

3 

Table.  4  :  Results  of  Simtdation  4.  A  triangular  Impulse  response  is  mod¬ 
eled  by  an  AR  model  with  p  =  5.  S  =  10~^  was  used  as  the 
stopping  criterion. 
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Fig.  4a  :  Simulation  4  :•  A  triangular  Impulse  response  is  modeled  by 
an  AR(5)  model.  S  s  10**^.  Result  shows  fit  after  Phase- 1 
convergence. 


Fig.  4b  :  Simulation  4  A  triangular  Impulse  response  is  modeled  by 
an  AR(5)  model.  6  =  10~^.  Result  shows  fit  after  Phase-2 
convergence. 
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SECTION  2.3  :  DESIGN  OF  2-D  RECURSIVE  DIGITAL  FILTERS  IN  THE  S’f’  iIAL  1.10MAIN 


Summary 

A  class  of  least-squares  algorithms  for  design  of  two-dimensional  digital  irom  space  domain  data  is 

presented.  The  proposed  algorithms  iteratively  estimates  the  filter  coefficients  by  minimizing  the  true  squared 
error  between  the  given  and  the  estimated  space  domain  responses.  The  algorithms  are  essentially  generaliza¬ 
tion  of  the  optimal  1-D  design  algorithm  given  by  Evans  and  Fischl  [7].  The  error  criterion  is  simultaneously 
optimized  w.r.t.  the  coefficients  in  both  dimensions.  Design  algorithms  are  given  for  filters  with  separable  and  ir¬ 
reducible  numerator/denominator  polynomials  and  rdso  for  mixed  structures.  The  effectiveness  of  the  algorithms 
is  illustrated  with  several  simulation  examples. 


I.  Introduction 


Given  the  space  domain  unit  sample  response  (2-D  impulse  response)  data,  an  important  design  problem  is 
to  obtain  the  optimal  coefficients  for  the  two-dimensional  infinite  impulse  response  (HR)  filters.  This  problem 
has  received  considerable  attention  in  the  recent  years  e.g.,  see  [1-4,10].  The  relationship  between  the  time  do¬ 
main  impulse  response  data  and  corresponding  1-D  recursive  filter  parameters  are  well  established.  Analogous 
relationships  can  also  be  established  between  the  space  domain  data  and  the  corresponding  2-D  recursive  filter 
parameters.  Exploiting  these  similarities,  many  2-D  filter  synthesis  algorithms  have  been  developed  by  mod¬ 
ification  and  extension  of  existing  algorithms  for  1-D  filter  design.  Specifically,  Shanks  ei  al  [1]  extended  the 
work  of  Shanks  [5] ;  Cadzow  [2]  and  Shaw  and  Mersereau  [3]  utilized  many  of  the  general  non-linear  optimization 
methods;  and  Shaw  and  Mersereau  [3]  also  extended  the  work  of  Steiglitz  and  McBride  [6].  But  these  methods  are 
suboptimal  in  the  sense  that  they  do  not  optimize  the  exact  error  criterion.  In  contrast  to  these  approaches,  the 
iterative  method  (EFM)  proposed  by  Evans  and  Fischl  [7]  is  optimal  and  it  does  optimize  the  true  error  criterion. 
Recently  some  work  [10]  has  been  done  on  the  2-D  extension  of  EFM  for  spatial  domain  design,  but  we  believe 
that  the  full  potential  of  EFM  has  not  been  utilized  for  2-D  recursive  filter  design.  The  maiin  drawbacks  of  the 
approach  in  [10]  is  that  the  complete  error  criterion  was  not  optimized  and  the  second  phase  of  EFM  was  not 
invoked.  Also,  the  error  criterion  was  not  optimized  w.r.t.  the  filter  coefficients  in  two  domains  simultrineously. 

The  Evans-Fischl  method  is  extremely  effective  in  1-D  filter  design.  A  modified  complex  version  of  the  EFM 
with  certain  symmetry  constraints  has  recently  been  shown  to  be  equally  effective  for  maximum-Ukelihood  1-D 
and  2-D  frequency-wavenumber  estimation  [8,9].  In  this  work  we  develop  a  2-D  version  of  the  EFM  in  order  to 
establish  a  general  framework  for  optimal  and  sub-optimal  design  of  2-D  recursive  filters  from  the  given  space 
domain  data. 

This  Section  is  arranged  as  follows  :  In  Subsection  II,  the  least-squares  problem  is  formulated.  In  Subsection 
III,  the  case  with  separable  numerator  and  separable  denominator  polynomials  is  considered  in  detail  and  the 
error  criterion  is  related  to  the  case  with  irreducible  numerator.  In  Subsection  IV,  the  irreducible  case  is  outlined 
briefly.  In  Subsection  V,  simulation  results  are  provided  to  illustrate  the  effectiveness  of  the  proposed  algorithm. 
Finally,  in  Subsection  VI  some  concluding  remarks  are  given. 

II.  Formulation  of  the  2-D  Least-Squares  Synthesis  Problem 


In  the  general  form,  a  2-D  rational  function  H{zi,Z2),  with  non-decomposable  numerator  and  denominator 
polynomials  is  described  by; 


^(^1,^2) 


Qizi,Z2)  ^  ErioE?Iog(«.J>r'^2~^ 
^(^1 .  ^2)  YT=0  ErJo  P(*.  7)2r‘  22  "  ’ 


(1) 
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Equivalently,  H{zi,Z2)  may  also  be  written  as, 

fi{2l,22)  =  zfHz2  (2) 

where,  Zi  A  [1  •  •  •  zf 22  A  [1  z^^  •  • -Zj  and 

A(0,0)  A(0,1)  •••  MO,  *2-1) 

MI.O)  Mi.i)  •••  Ml.  *2-1) 

M*1-1,0)  MM-1.1)  M*i-i,*2-i) 

In  (3),  H  contains  the  ki  x  k2  significant  impulse  response  values.  By  stacking  the  columns  of  H,  the  impulse 
response  may  be  represented  in  vector  form  as, 

he  A  [hf  hj  •••  (4) 

where,  h<  denotes  the  column  of  H.  Next,  let  the  given  space-domain  impulse  response  matrix  be, 

z(0,0)  z(0,l)  •••  z(0,*2-l) 

z(l,0)  z(l,l)  •••  X(l,*2-1) 

z(*i-l,0)  z(*i-l,l)  •••  z(*i-l,k2-l) 

and  the  corresponding  vector  be; 

Xc  ^  [xf  •••  xj’j’’.  (6) 

In  this  work,  we  address  the  2-D  least-squares  space-domain  synthesis  problem  as  stated  below  : 

Given  the  space-domain  2-D  impulse  response  matrix  X,  estimate  q  A  [?(0, 0)  ?(0, 1)  •  •  •  9(01, 02)]^  and  p 
A  [p(0, 0)  p(0, 1)  •  •  •  p(mi ,  m2)]^  by  optimizing  the  following  f2-Dorm  based  error  criterion, 

min||e|p  A  ||xe  -  hc||*  witbp(0,0)  =  1.  (7) 

q.p  = 

Next,  we  formulate  and  outline  the  optimization  procedures  for  several  specied  cases  of  interest. 

III.  Design  With  Separable  Denominator  and  Separable/Irreducible  Numerator 


In  this  Subsection,  we  will  first  perform  the  derivation  for  the  separable  numerator  case  and  then  relate  our 
results  to  the  irreducible  numerator  case.  It  may  be  noted  that  the  optimal  sepuable  denominator  polynomials 
are  same  for  both  cases. 

The  2-D  rational  transfer  function  with  separable  numerator  and  denominator  polynomials  cam  be  written 


as, 


H(zi,Z2) 


Er=oM«)^f  e;=oKj>2~^ 
E::‘oc(.>rErjoMj>2'^' 


(8) 


Note  that  for  strictly  proper  case,  mi  =  ni  -t- 1  and  m2  =  n2  -b  1.  The  numerator  and  denominator  coefficients 
in  equations  (1)  and  (8)  are  related  as  follows: 


q{i,j)  =  a{i)b{j)  for  0  <  t  <  ni,  0  <  j  <  n2 

p{i,j)  =  c{i)d{j)  for  0  <  *  <  mi,  0  <  j  <  m2 


(9a) 
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The  space-domain  transfer  function  H(zi,Z2)  can  also  be  written  in  the  separable  form  as, 

t=0  J=0 

where,  f  A  [/(O)  /(I) •  -/(hi  -  1)]’^, 


zft  =  ‘£‘  /(■>.-“  A  (ii«) 

t=o  Z^i=o 

and 

g"..  =  E  j(iW'  A  (116) 

From  (10),  the  2-D  space-domain  impulse  response  values  in  (3)  are  related  to  the  separable  1-D  impulse  response 
values  as, 

A(».i)  =  /(OffO).  for  0  <  »■  <  hi  -  1, 0  <  i  <  *2  -  1  (11c) 

From  (11a),  the  transfer  function  coefficients  in  a  A  [a(0)  a(l)  •  •  ■  a(ni)]^  and  c  A  [c(0)  c(l)  •  ■  •  c(mi))^  are 
related  to  the  impulse  response  values  in  f  as  [7] 


where. 


Fi  A 


F2  a 


/(O) 

[S]  =  [ 

0 

/(I) 

/(O)  ••• 

./(mi  -  1) 

/(mi  -  2)  •  •  • 

'  /(”»i) 

/(mi  -  1)  -  •  ■ 

/(mi  -1- 1) 

/(mi) 

./(hi-  1) 

/(hi  - 2)  . 

0  0- 

0  0 

m  0. 


/(hi -mi)  /(hi-mi-1). 


Note  that  the  lower  partition  of  (12)  can  also  be  written  as, 

0  =  Fjc  =  C'^f 

where,  C  is  hi  x  (hi  —  mi)  banded  Toeplitz  matrix  defined  as. 


c(mi) 

0 

0 

c(mi  -  1) 

c(mi)  •• 

0 

c(0) 

c(l)  ••• 

c(mi) 

0 

c(0)  ... 

c(mi  -  1) 

0 

0 

c(0)  . 
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Proceeding  in  a  similar  manner  from  (116),  we  get  the  relationships  between  b  A  [6(0)  6(1)  ■  •  •  6(n2)]^,  d  A  [d(0) 
d(l)  •  •  •  d(m2)]^  and  the  impulse  response  values  in  g  as 


Again,  the  lower  partition  of  (16)  can  be  written  as, 

0  =  Gad  =  D'^g, 


(16) 


(17) 


where,  Gi ,  G2  and  D  are  defined  in  a  manner  similar  to  Fi ,  F2  and  C,  respectively,  with  appropriate  dimensions. 
Now,  for  the  2-D  transfer  function  H(zi,Z2)  as  defined  in  equations  (8)-(10),  the  coefficients  zae  related  to  the 
2-D  impulse  response  values  as  follows: 


"a  ®  b' 
0 
0 

.  0  . 


where,  (8  denotes  the  Kronecker  product.  Following  (9a),  for  the  case  of  irreducible  numerator  design,  the  terms 
‘*(*)^(j)>  for  *  =  l.  -  -.ni  and  j  =  1, ...  ,mi  in  a ®  b  on  the  left  hand  side  should  be  replaced  by  ^(i.j).  Also 
in  (18),  Cl  and  C2  become  ki  x  mi  jmd  ki  x  m2  matrices,  respectively,  defined  as. 


rc(o) 

c(l) 

c(mi  - 

0 

c(0) 

. . . 

c(mi  - 

2) 

Cl  A 

0 

0 

0 

c(0) 

and  (19a) 

0 

0 

0 

0 

.  0 

0 

0 

0 

rd(0) 

d(l) 

d(m2  - 

!)■ 

0 

d(0) 

d(m2  - 

2) 

a 

II  i> 

0 

0 

0 

d(0) 

(196) 

0 

0 

0 

0 

■ 

.  0 

0 

0 

0 

. 

and  using  (4)  and  (11c), 

he  = 

:  f®g. 

(20) 

Hence,  the  three  bottom  partitions  of  (18)  can  be  expressed  as 


1 1  ®  Gi 

Fi  ®  G2 
F2  ®  Gi 

F2  ®  G2  J 

Cj'  ® 

Cf 


(c  ®  d] 


(f®g]  = 


Cf 


he 


(18) 


C^®Dj' 

® 

L,  « 


h. 


0. 


(21) 
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Equation  (21)  essentially  implies  that  the  matrices  Ci^D,  C0Di  and  C®D  are  orthogonal  to  he-  Moreover,  by 
construction,  these  matrices  contain  kik2  —  mimj  linearly  independent  vectors  of  length  kik2-  The  remaining 
mi  m2  linearly  independent  vectors,  that  complete  the  entire  kik2  dimensional  vector  space,  are  in  Ci  ®  Di  as 
may  be  seen  by  rewriting  the  upper  partition  of  equation  (18)  as  follows. 


[a®  b] 
or,  [a®b] 


[Fi  ®Gi][c®  d] 
[Cf  ®Df][f®g] 
[Cr^Dflhe 


(22a) 

(226) 

(22c) 


Therefore,  the  search  for  the  coefficient  vectors  c  and  d  is  equivalent  to  the  search  for  kik2  —  mi  m2  linearly 
independent  vectors  which  are  orthogonal  to  the  impulse  response  vector  he.  In  practice,  we  do  not  have  he  and 
if  we  replace  he  by  the  given  space  domain  impulse  response  values  Xe  as  defined  in  (6),  the  r.b.s.  of  (21)  will  not 
be  equal  to  zero  and  there  will  be  some  equation  error  which  we  define  as  e,  i.e., 


c'^(»dT 

C^®D^ 


Xe  =  e. 


(23) 


But  from  C^),  Xe  =  he  +  e  and  hence, 


‘Cf  ®D’’' 
C^®Df 
C^®D^ 


p»c  +  e]  =  e. 


Substituting  (21)  in  (24)  we  see  that  the  fitting  error  e  in  (7)  can  be  related  to  the  equation  error  e  as. 


rc];®D’’i 

c^®dT 

C^®D^ 


e 


e. 


(24) 


(25) 


Following  the  EFM,  the  estimation  of  the  numerator  and  denominator  coefficients  are  now  separated  into  two 
parts.  If  the  denominator  coefficients  are  known  then  the  numerator  coefficients  may  be  estimated  from  (22c)  by 
replacing  h,,  by  Xg.  It  may  be  noted  here  that  a  direct  implementation  of  (22c)  will  give  numerator  coefficients 
q(t,  j)’s  which  are  the  coefficients  of  the  optimum  irreducible  numerator  polynomial.  This  polynomial,  in  general, 
will  not  be  separable.  Hence,  to  obtain  the  separable  form,  we  have  to  find  (using  SVD)  the  r2mk-l  approximation 
of  the  matrix  Q  formed  with  the  elements  9(t,i).  Except  for  the  scale  factor  which  is  the  largest  singular  value, 
the  first  column  and  row  singular  vectors  will  contain  the  coefficients  of  the  separable  numerator  polynomials,  a 
and  b,  respectively. 

For  determining  the  optimd  separable  denominator  polynomials,  we  have  to  make  use  of  certain  orthogonality 
conditions  [8,9]  in  order  to  show  that  the  minimization  of  ||e|p  is  exactly  equivalent  to  optimizing  the  following 
criterion, 

®^d)  +  (7^c®Ita)  -  {'Pc®'Pd))^c)  (26) 

c,a 

where,  I*,  G  IR*’***'  and  It,  £  are  identity  matrices  and 

Vd  a  D(D’’D)-‘d’’  and 
Vc  A  C(C^C)-iC^ 


are  projection  matrices. 


(27a) 

(276) 
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For  complex  data  and  complex  coefficients,  an  iterative  algorithm  for  optimization  of  a  criterion  very  similar 
to  the  one  in  (26)  but  with  certain  symmetry  constraints  was  developed  in  [8]  and  [9].  Using  derivations  similar 
to  the  ones  in  [8]  and  [9]  we  can  show  that,  similar  to  EFM,  a  quasi-linear  relationship  can  be  established  between 
the  error  and  the  polynomial  coefficients  in  c  and  d  which  enables  us  to  optimize  the  criterion  iteratively  without 
resorting  to  any  general  optimization  algorithms. 

It  may  be  pointed  out  here  that  in  [10-12],  only  the  first  term  in  (26)  was  minimized  separately  with  one 
set  of  denominator  coefficients.  Hence,  the  estimates  of  the  denominator  coefficients  are  suboptim£il  because  they 
were  not  obtained  by  optimizing  the  true  error  criterion.  Also,  following  [8,9]  and  unlike  [10],  the  optimization 
of  (26)  can  be  carried  out  w.r.i.  both  sets  of  parameters  c  and  d  simultaneously.  Details  eire  omitted  and  will  be 
published  elsewhere.  In  the  next  Subsection,  we  generalize  our  result  to  the  irreducible  case. 

IV.  Optimal  Design  with  Irreducible  Numerator  and  Denominator 

Rewriting  (1)  as, 

H(2i,Z2)P(zi,Zi)  =  Qizi,Z2)  (28) 

and  equating  the  coefficients  of  like  powers  of  z^'z^^  V  i,j,  we  get  the  following  relationship  between  the 
coefficients  and  the  space-domain  impulse  response. 


This  relationship  is  equivalent  to  the  one  derived  in  (18)  for  the  separable  case  though  it  should  be  emphasized 
that  the  coefficients  and  the  space-domain  response  are  not  seprurable  in  the  present  case.  Specifically,  the 
numerator  coefficients  ?(i,j)’s  replace  a(j)6(j)’s  in  the  l.h.s.  of  (18)  and  h(i,j)’s  and  p(i,i)’s  replace  /(»)ll(j)’s 
and  c(i)d(j)’s,  respectively,  in  the  r.h.s  of  (18).  Furthermore,  Hi  replaces  Fi  ®  Gi  in  the  uppermost  partition  of 
(18)  and  H2  replaces  the  rest  of  the  three  lower  partitions  of  (18).  Once  again,  the  problem  will  be  divided  into 
two  optimization  problems.  First,  the  denominator  coefficients  will  be  found  by  optimizing, 

minllp^Xe(P^P)-'Xcp||^  (30) 

p 

where, 

X,  A  [Xfxl’  ..-Xlf  (31) 

and  each  Xj  is  formed  using  the  elements  in  the  V*  column  of  X  as 

Xi(l,k)  A  Xi(mim2 -I- mi  +  m2  +  /  -  k -bl)  for  i  =  1,2, . .  .,ib2-  (32) 

Once  p  is  found  the  numerator  coefficients  are  found  from  the  upper  partition  of  (30). 

V.  Simulation  Results 

In  this  Subsection,  we  present  two  numerical  examples  to  illustrate  the  effectiveness  of  the  proposed  algorithms 
for  design  of  denominator  separable  filters. 

Example  1:  Quarter- Plant  Gaussian  Filter 

Consider  a  “Gaussian  filter”  whose  impulse  response,  defined  over  the  first  quadrant,  is  given  by 


H{i,j)  =  0.256322  exp  [-0.103203{(i-  4)*  -I-  {j  -  4)^}] 
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where  (i,j)  €  Sj  and  the  support  S/  is  given  by 

S/  =  {(i,j)  \Q<i<Ni  0<j<M}. 


With  N  =  M  =  10  and  using  the  technique  proposed  above,  2-D  linear  shift-invariant  causal  (mi,mi)-th  order 
filter  with  the  transfer  function 


H{zi,Z2) 


C{zi)D(Z2) 


where  (3(21,2:2)  =  E"=o  E”=o  9(». ^(^1)  =  l  +  c(l)2f^-l- - 1- c(mi)2i  ”*>  and  £>(22)  =  H-d(l)22*-h 

- 1-  d(mi)22  ”**,  is  designed  to  approximate  the  impulse  response  of  the  Gaussian  filter.  Note  that  the  results 

presented  below  are  for  “strictly  proper”  filter,  i.e.  ni  =  mi  —  1.  We  have  compared  the  results  obtained  from 
our  algorithm  with  those  obtained  using  Hinamoto  and  Maekawa  [12]  for  different  order  realizations. 


Table  5.1:  Example  1:  Comparison  of  error  norms 


Order 

Proposed  Method 

7.0083E-01 

6.5181E-01 

3.6349E-01 

2.1721E-01 

1.7867E-01 

2.5051E-02 

5 

1.0192E-01 

4.5056E-04 

Example  2;  Ideal  Circular  Low- Pass  Filter 
The  impulse  response  was  generated  as 

H{i,j)  = 

where  Ji{  )  denotes  the  Bessel  function  of  the  first  kind  with  order  one  with  r  =  3.0.  The  support  region  is  again 
i,j  =  1, . . .,  11.  For  this  example,  we  designed  a  “strictly  proper”  as  well  as  “proper”  filter  for  various  orders. 
Note  that  although  the  proper  design  is  sub-optimal,  it  gives  better  results  than  the  optimal  strictly  proper  design. 
This  may  be  due  to  the  fact  that  we  are  able  to  match  greater  number  of  impulse  response  samples  exactly.  A 
comparison  of  the  error  norm  is  given  in  Table  5.2. 


Table  5.2:  Example  2;  Error  norms 


Order 

Strictly  Proper 

Proper 

8.7947E-02 

6.681  lE-02 

1.1528E-01 

6.3363E-02 

6.6971E-02 

4.2060E-02 

9.5587E-02 

3.7600E-02 

rJijry/i^  +J^) 
2xv^i*  +  p 
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SECTION  2.4  :  IDENTIFICATION  OF  DISCRETE  TIME  MULTIVARIABLE  SYSTEMS  FROM 
IMPULSE  RESPONSE  DATA 

Summary 

In  this  Section  we  consider  the  problem  of  identification  of  transfer  function  matrices  of  discrete  time  mul¬ 
tivariable  systems.  The  proposed  technique  obtains  an  optimal  approximation  from  the  given  (possibly  noisy) 
measured  impulse  response  data.  It  is  assumed  that  the  measured  impulse  response  data  corresponds  to  a  system 
with  a  strictly  proper  transfer  function  matrix.  Based  on  the  proposed  theoretical  basis,  an  efficient  computational 
algorithm  is  developed  and  illustrated  by  means  of  several  examples. 

1.  INTRODUCTION 

Mathematical  models  of  linear  systems  can  be  broadly  classified  as  non  parametric  and  parametric  models. 
Non-parametric  models  include  impulse  responses,  covariance  functions,  spectral  density  descriptions  etc.  These 
models  tend  to  be  infinite  dimensional  in  nature.  Parameterization  leads  to  finite  dimensional  models.  Some 
examples  of  parametric  models  are  differential  equations,  difference  equations,  transfer  functions,  state  space 
descriptions  etc.  In  parametric  modeling,  having  assigned  a  model  structure  to  the  system,  the  problem  is  to  find 
best  set  of  parameters  to  represent  the  system. 

The  problem  of  model  identification  of  single-input,  single-output  (SISO)  continuous  as  well  as  discrete-time 
systems  is  very  well  studied  [1]-[10].  However,  despite  the  importance  of  the  problem  of  identification  of  multi¬ 
input,  multi-output  (MIMO)  systems,  relatively  small  proportion  of  the  existing  literature  addresses  it.  This  is 
due,  in  parts,  to  the 

1 )  non-uniqueness  in  the  parameterization  of  multivariable  systems, 

2)  difficulty  in  determining  a  cost  function  that  reflects  appropriately  the  importance  of  various  input  output 

pairs  and 

3)  limited  success  in  extending  the  well  established  results  from  SISO  system  theory  to  multivariable  systems. 

In  recent  years,  several  authors  have  investigated  the  problem  of  parameterization  (for  the  purpose  of  identi¬ 
fication)  and  identification  of  MIMO  systems.  Choice  of  parameterization  was  discussed  by  Glover  and  Willems 
[11],  Denham  [12]  and  Gevers  and  Wertz  [13],  where  it  was  shown  that  knowing  the  order  of  the  system,  a  minimal 
set  of  parameters  that  uniquely  define  the  system  can  be  identified.  The  problem  of  MIMO  system  identification 
has  been  addressed  by  several  researchers  using  several  approaches;  Among  others,  Moonen  and  Vandewalle  [14] 
developed  a  quotient  SVD  framework  for  identifying  state  space  models  from  the  input-output  error  covariance 
matrix.  Helmicki,  Jacobson  and  Nett  [15]  and  Gu  and  Khargonekeir  [16]  have  developed  robustly  convergent  in 
H°°  framework.  M^d(ila  [17]  uses  Laguerre  series  for  identification  in  H°°  framework  and  Rao  [18]  uses  Walsh 
functions  for  identification  of  multivariable  systems.  In  the  wide-sense  stationary  random  process  framework, 
Friedlander  presents  a  modified  Yule- Walker  method  for  estimating  the  multi-channel  ARM  A  parameters  in  [23]. 
It  should  be  emphasized  that  the  above  references  are  only  some  of  the  most  recent  ones  appearing  in  literature 
and  those  which  address  the  multivariable  identification  problem.  For  the  SISO  systems,  references  [1]-[10]  provide 
an  excellent  exposition  for  the  solution  of  the  identification  problem  and  also  contain  extensive  bibliography. 

In  this  work,  we  study  the  problem  of  determining  a  parametric  model  (a  discrete  time  transfer  function 
matrix)  from  the  given  impulse  response  data.  We  will  assume  that  the  system  that  we  wish  to  identify  is 
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represented  by  a  (p  x  m)  rational  function  matrix; 

aij(z)  •  •  ■ 

Olm(2) 

=  W) 

071(2) 

077(2)  •  •  ■ 

07m(2) 

_  A(z) 
-  b(z) 

Op7(2)  ■ -  • 

0pm  (z). 

where  deg(6(z))  =  n  and  A(2)  is  a  (pxm)  polynomial  matrix  such  that  deg(aij(2))  =  n— 1,  i(j)  =  1,2,...,  p(m). 
Further,  it  is  assumed  that  the  first  N  terms  of  the  measured  (possibly  noisy)  unit  pulse  response  data  of  the 
system 

N-i 

H(z)  =  53  (1.2) 

•=o 

are  available  where  H,  represents  the  matrix  of  impulse  responses  at  the  i-th  time  instant. 

It  is  well  known  that  even  for  a  single  input,  single  output  system,  when  the  unknown  system  contains 
both  poles  and  zeros,  the  problem  of  identification  of  numerator  and  denominator  polynomial  coefficients  is  a 
highly  non-linear  optimization  problem.  Two  of  the  techniques  that  have  been  used  frequently  in  parameters 
identification  of  scalar  plants  in  signal  processing  literature  are  those  proposed  by  Steiglitz  and  McBride  [19]  and 
Evans  and  Ficshl  in  [20].  Evans-Fischl’s  approach  minimizes  the  difference  (in  the  least  square  sense)  between 
the  measured  and  the  desired  impulse  response  data,  while  Steiglitz  and  McBride  approach  uses  linearized  error 
criteria.  In  this  respect,  provided  the  degree  of  the  numerator  polynomial  (m)  is  one  less  than  the  degree  of  the 
denominator  polynomial  (n),  Evans-Fischl  approach  can  be  considered  to  be  “optimal”. 

The  primary  purpose  of  this  work  is  to  generalize  Evans-Fischl  method  (EFM),  to  the  case  when  the  number 
of  inputs  and  output  is  greater  than  one.  We  propose  a  generalized  error  norm  measure  by  giving  equal  weight 
to  impulse  response  corresponding  to  each  input/output  pair.  Based  on  this  error  norm,  a  single  denominator 
polynomial  with  a  pre-specified  degree  is  computed.  Knowing  the  coefficients  of  the  denominator  polynomial,  the 
numerator  polynomials  are  evaluated  by  solution  of  linear  algebraic  equations. 

The  layout  of  this  Section  is  as  follows:  Since  Evans-Ficshl  technique  is  not  very  well  known  in  the  control 
systems  literature,  in  Subsection  2,  we  briefly  review  this  approach.  In  Subsection  3,  the  error  criteria  for 
multivariable  system  is  defined  and  the  error  minimization  technique  is  extended  to  multi-input  multi-output 
systems.  In  Subsection  4,  the  identification  results  from  extensive  simulations  on  various  order  and  veirying 
degree  of  noise  contamination  are  presented. 

2.  Scalar  Systems 


Assume  that  the  given  single  input,  single  output  plant  is  described  by  a  strictly  proper  stable  z-domain  transfer 
function: 


H(z) 


oq  +  aiz~*  H - + 

1  +  6iz“*  H - 1-  -I-  b„z~"  ’ 


(2.1) 


where  the  coefficient  of  z°  term  in  denominator  has  been  assumed  to  be  unity  without  any  loss  of  generality. 
Using  long  division,  the  above  transfer  function  can  be  rewritten  as  the  infinite  series 


H(z)  =  ho  +  hiz-^ +  ■■■-(-  h„z-’'  +  h„+,z-("+‘) 


(2.2) 


Define  vectors  f ,  h  6  IR^,  where. 


f  =  l/o  /i 

h  =  [ho  hi 


(2.3) 
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denote  the  N  samples  of  the  measured  and  the  actual  impulse  response  data,  respectively.  Then,  the  identification 
of  parameters  a,  and  6^  can  be  stated  as  the  following  least  squares  minimization  problem: 

rjv-i  1^/* 

min||e||  A  min|gc?|  (2.4) 

where. 


e  A  f-h 

(2.5a) 

a  A  [ao  ai 

••  a„_if 

(2.56) 

b  A  [1  61  -  - 

•  b„f. 

(2.5c) 

The  transfer  function  coefficients  are  related  to  the  impulse  response  samples  in  H(2)  as 


a 

Hi' 

0 

.Hz. 

where,  a,  b  have  been  dehned  in  (2.5)  2uid 

r  ho 

0 

.  .  . 

0 

Hi  A 

hi 

ho 

.  . 

0 

; 

-  ^0  —  1 

bn- 

2  *'* 

bo 

r  hn_i 

• . 

bo 

hn+1  h„ 

... 

hi 

Hj  A 


hff-n-iJ 


(2.6) 


(2.7a) 


(2.76) 


If  b  and  Hi  are  known,  then  a  can  be  found  by  solving  the  system  of  linear  algebraic  equations  a  =  Hib. 
However,  in  the  present  case,  the  exact  h  and  therefore,  the  matrices  Hi  and  H2  are  not  known.  Therefore,  we 
replace  the  elements  of  Hi  and  Hj  by  the  corresponding  matrices  Fi  and  formed  from  the  measured  impulse 
response  data  f.  To  obtain  the  initial  estimate  for  b,  consider  the  lower  half  of  (2.6)  given  by  H2b  =  0.  Replacing 
H2  by  F2  and  expanding  the  relation,  we  get 


/n  fn-l 
fn+l  fn 

In-I  fN-7  /w-3 


=  d(b), 


(2.8) 


where  <i(b)  is  the  equation  error.  The  above  equation  can  be  rewritten  as 


fn—l  fn-7 

fo 

rtii 

■  fn  ■ 

fn  fn-l 

...  /, 

62 

—  ^ 

fn+l 

-//V-2  //V-3 

•••  fpr-n-l- 

.6n. 

■fn-l- 

+  d(b). 


(2.9) 
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Now,  let 


’  /n-1  /n-2 

/o 

•  fn  • 

fn  /n-1 

...  /, 

and  g  = 

/n+1 

;  ; 

; 

-/jV-2  /Af-3 

•  ftf-n-l- 

-fs-i- 

the  initial  estimate  for  b  can  be  obtained  by  minimizing  ||d(b)||  with  respect  to  b  =  [6i  62 
computed  as 

1 

b(°)  = 

-Gtg 


(2.10) 


6„]^  and  can  be 


(2.11) 


where  ct  denotes  the  pseudo-inverse  of  G.  In  general,  the  first  approximation  is  fairly  crude  one  because  It  only 
minimizes  an  equation  error  and  does  not  minimize  the  actual  fitting  error  norm  of  (2.4).  Unlike  the  equation 
error,  the  fitting  error  will  be  shown  to  be  non-linearly  related  to  b  and  hence  it  has  to  be  refined  iteratively  to 
obtain  a  better  denominator  polynomial  to  match  the  impulse  response. 

Note  that  in  (2.8),  if  the  exact  impulse  response  b  had  been  known,  the  equality  will  be  satisfied.  However, 
due  to  measurement  noise,  there  will  be  some  residual  error  d(b)  as  shown  in  (2.8).  This  equation  error  is  now 
rewritten  as: 


d(b)  A  F2b 

bn  6„-,  •••  1 

bfl  b,]_i  1 

A  B^’f.  (2.12) 

Now,  f  can  be  represented  in  terms  of  e  and  h  as  f  =  h  -|-  e  and  (2.12)  can  be  expressed  as 

d(b)  =  B’’[h-»-e] 

=  B^e  because  Hjb  =  B^h  =  0.  (2.13) 

Rewriting  the  error  to  be  minimized  in  terms  of  equation  error  and  using  the  “projection  theorem”  [21],  we  get 
(some  more  explanation  of  the  rationale  behind  this  approach  is  given  later  for  the  multi-channel  case), 

e  =  B(B’'B)-*B^f 
A  WB^’f 
=  WFzb 
=  W[g  G]b 

=  Wg  WGb  (2.14) 

where  b  =  [61  62  •  •  •  bnY ■  Rrom,  (2.15),  it  is  clear  that  WGb  =  —  Wg  -|-  e  and  a  new  expression  for  b  can  be 
obtained  by  minimizing  ||e||  as: 


b  =  -(WG)tWg 

=  -  (G^W^WG)"^  G^'W^’Wg. 


(2.15) 
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In  summary,  the  estimate  for  b  is  iteratively  refined  using  the  following  relation; 

1 

bC)  =  . 

-[VG)-^[V]g. 

where, 


V  A  [G^W^W] 

W  A  [B(B^B)-*1  and 


(2.16) 


(2.17a) 

(2.176) 


(2.17c) 


It  should  be  mentioned  here  that,  at  each  iteration,  the  new  improved  estimates  of  b  are  used  in  forming  the 
matrix  W.  Since  the  above  iterations  minimize  the  exact  fitting  error  of  (2.4),  at  convergence,  the  optimal 
estimate  of  b  is  found.  The  iterations  are  performed  in  two  phases.  The  scalar  case  being  only  a  speciad  case, 
these  two  phases  are  explained  with  more  details  in  Subsection  3. 

When  the  estimates  of  b  converge,  a  can  be  computed  directly  as  a  =  Hib,  where  Hi  has  the  same  form  as 
Hi ,  except  the  elements  hi  are  replaced  by  /<  —  Cj ,  i  =  and  c,  are  the  elements  of  the  error  vector  e  = 

[co  Cl  •  •  •  Civ-i]^  =  WB^f.  Here  WB^  are  formed  using  the  optimized  values  of  b. 

3.  Multivariable  Systems 


In  this  Subsection  the  EFM  algorithm  is  generalized  for  the  Multi-Input/Multi-Output  (MIMO)  case. 

Assume  that  the  given  plant  is  described  by  the  rational  transfer  function  matrix  G(z)  in  (1.1).  Denoting 
each  (i,  j)-th  element  of  this  matrix  as  Hij ,  G(z)  may  also  be  written  as. 


G(z)  = 


■Hii(z)  Hi2(z) 

H22{z) 

.Hpi(z)  Hp2{z) 


Hlmiz)- 

H2n,iz) 
ffpm(z)  - 


where  each  Hij(z)  is  given  by. 


„  f  .  _  a.j(0)  +  o<j(l)^~^  +  •  -f  a.j(T»  -  l)z 

■  1  +  6y  (l)z-»  +  •  • .  +  bij{n  -  l)z-("-D  +  6.-,(t.)z-'‘  ' 


Now,  similar  to  (2.2),  Hij{z)  can  also  be  written  as. 


Hij(z)  =  A„(0)  +  A<i(l)z-^  +  -  ■-l-/»,;(n)z-"+  -- 


Let  f,;  ,  hjj  €  where. 


.ft;  =  [/O(0)  yi;(l) 

ho  =  [Ao(0)  hij{l)  ■■■  hii{N-l)f, 


(3.4) 
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denote  the  N  samples  of  the  measured  and  the  desired  impulse  response  data,  respectively,  corresponding  to  the 
tj-th  polynomial  element  of  the  G(z)  matrix.  It  should  be  mentioned  here  that  the  number  of  measured  samples, 
N,  has  been  assumed  to  be  equal  for  each  case  without  any  loss  of  generality  and  that  the  algorithm  can  be 
modified  easily  if  the  number  of  measurements  are  unequal.  The  only  restriction  is  that  for  each  case,  there  must 
be  at  least  2n  measurements,  n  denoting  the  order.  Next,  defining  the  error  vector  for  the  tj-th  case  as. 


the  whole  error  matrix  may  be  written  as. 


h.;. 

eii 

ei2 

'  * '  6inri 

621 

622 

•  •  •  62m 

■  epi 

1 

ep2 

1 

•  ■  ■  6pm 
.  1  ■ 

ei 

62  '• 

©m 

Then,  the  least-squares  minimization  problem  can  be  stated  as 

m  JV  1 

min  ||E||f  A  min 

where  ||  •  H/-  denotes  the  Frobenius  norm  (or  the  matrix  ^2  norm)  and 

Bij  =  [ao(0)  ay(l)  •••  o,;(n-l)f 

b  =  [1  6(1)  ...  6(n)f . 

For  ease  of  formulation,  a  large  vector  of  errors  is  created  from  E  next.  Define, 


(3.6a) 


(3.8a) 

(3.86) 


e,  A  Vec  [E]  = 


(3.9a) 


where  yec[  ]  is  the  operation  of  stacking  the  columns  of  a  matrix  to  form  a  large  vector.  Note  that  e  is  a  pmN  x  1 
vector  which  can  be  related  to  the  measured  and  the  desired  impulse  responses  as. 


where 


ei  =  ft  —  hj. 


and  h|  A 


(3.9c) 
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With  this  definition,  the  least-squares  minimization  criterion  of  (3.7)  can  be  restated  as 


min  i|e,l|2  A 

{•.y}.b  = 


min 


tpmN 


53  c?(i) 


1/2 


(3.10) 


where  ||  ■  ||2  denotes  the  vector  ^2  norm.  Now,  corresponding  to  each  iji-th  element  in  G(2)  matrix,  we  can  form 
similar  to  (2.6), 

faiil  fH?,! 

(3.11) 


H/i' 

0 

where,  ,  b  have  been  defined  in  (3.8)  and 


Ho  A 


H?.  A 


r  /‘.■i(o) 

0 

0 

hii(l) 

Ao(0) 

0 

.hij(n  - 

1)  hij(n-2) 

•  MO) 

^o(«) 

hij(n-l)  ••• 

Ao(0) 

(n  +  1) 

hijin) 

Ao(i) 

(N-l) 

hij(N~2)  ••• 

hij{N  -  n 

61R' 


(n)x(n+l) 


€1R^^  -n)x(n+l) 


Now,  Stacking  the  upper  partitions  of  (3.11)  for  all  i,j  we  get. 


(3.12a) 


(3.126) 


a{  A 


an  ■ 

“11 

*21 

h(i) 

“21 

„(») 

“pi 

aim 

a2m 

apm  . 

TlO) 

^  ^pm  J 

b  A  Hp^b. 


(3.13a) 


Similarly,  the  lower  partitions  of  (3.11)  may  also  be  stacked  as, 


S' 

r  -1 

B’'h2i 

0 

h(2) 

“pi 

b  A  H^^b  = 

B’’hpi 

0 

0 

H(2) 

“im 

h(2) 

“2m 

B^hi„ 

B^hjm 

.  0  . 

L  npm  J 

.B’’hp„. 

=  (Ip„  ®  B^)li, 


(3.136) 
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where,  0  denotes  the  matrix  Kronecker  product,  B  is  defined  in  (2.17c)  and  denotes  an  pm  x  pm  identity 
matrix.  If  the  elements  in  and  b  are  known,  an  can  be  found  uniquely  from  (3.13a).  Replacing  the  Hjj’s 
with  the  corresponding  F,;  ’s,  an  equation  error  is  formed. 


d;(b)  A  Fp^b  A 


(3.14) 


Following  the  steps  analogous  to  (2.8)-(2.11),  one  can  again  find  an  initial  estimate  of  b  as  follows, 


b^o)  = 


-G}gi 


(3.15) 


where,  g;  contains  the  first  column  of  F^^  and  G/  contains  the  rest  of  the  columns.  But  in  order  to  find  the 
optimum  estimate  of  b  we  still  have  to  optimize  the  criterion  in  (3.10).  To  proceed  in  that  direction  the  equation 
error  in  (3.14)  is  rewritten  in  a  more  useful  form  as. 


d,(b)  = 


B^fn 

B^f21 


Replacing  f|  by  ej  +  h|  and  using  (3.13b)  we  get. 


d,(b)  =  (Ipm®B’’)e, 


(3.16) 


(3.17) 


But  in  order  to  facilitate  the  minimization  of  the  fitting  error  norm  of  (3. 10),  we  have  to  find  an  inverse  relationship 
between  e;  and  d/(b).  According  to  orthogonality  principle,  the  error  e/  for  a  given  b  and  corresponding  to  the 
optimum  a;  must  be  orthogonal  to  the  desired  response  vector  h/.  Otherwise  there  would  remain  some  information 
contained  in  the  non- zero  projection  of  ei  onto  hj.  The  complete  orthogonal  basis  space  of  this  error  can  be  found 
from  equation  (3.13b)  which  clearly  demonstrates  that  the  pm{N  —  n)  columns  of  (Ipm  ®  B)  are  orthogonal  to  hf. 
Hence  the  error  e((a,')  corresponding  to  optimal  ai  may  be  formed  as  a  linear  combination  of  all  its  orthogonal 
basis  vectors  as  follows, 

e,(ar)  A  (Ip„.0B)c  (3.18) 


where  c,  a  vector  of  unknown  constants,  needs  to  be  determined.  By  plugging  in  (3.18)  in  (3.17)  we  obtain 


d,(b)  =  (Ip,n®B^)(Ip„®B)c 
=  (Ip„®B’’B)c. 


(3.19a) 

(3.196) 
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Hence, 


C  =  (Ipm  ®  (B’’B)-Od,(b) 

=  (Ipm  ®  (B^B)-»B’’)f,  [using  (3.16)].  (3.20) 


Plugging  this  back  in  (3.18)  and  following  the  similar  steps  as  in  (2.14), 


e,(ar)  =  (Ip,„®B(B’’B)-^B’’)f,  (3.2ia) 

=  (Ipm  ®  B(B^B)-')(Ip„  ®  B^)f,  (3.216) 

A  W,(Ip„®B^)f,  (3.21c) 

=  WiFp^b;  [using  (3.14)]  (3.21d) 

AW,[g,  G,]b  (3.21e) 

=  W,g,  +  W,G,b.  (3.21/) 

For  an  optimum  a,  this  is  the  fitting  error  that  we  need  to  minimize,  i.e.,  the  criterion  in  (3.10)  is  exactly 
equivalent  to, 

rriin  ||e,(a;)||2.  (3.22) 

b 


At  minimum  this  will  produce  e(a* ,  b’).  Note  that  in  the  expression  of  ej(aj),  the  matrix  W  does  have  dependence 
on  b  and  b  contains  the  elements  with  respect  to  which  the  criterion  needs  to  be  minimized.  Following  EFM,  the 
minimization  of  (3.22)  is  performed  in  two  steps  of  iterations.  In  the  first  phase  of  iterations,  the  matrix  W  is 
constructed  from  the  estimate  of  b  found  at  the  previous  iteration.  The  new  update  of  b  is  then  obtained  from. 


b(«+i)  =  -  (Wj‘'^G,)tw|‘^g, 

=  -  (G?'wr^‘Vp>G,)"*Gfwr^*V[’>g,. 

In  summary,  the  estimate  for  b  is  iteratively  refined  using  the  following  relation: 


where, 


b(*+i) 


1 

.-[vWGi]-MvW]g,. 


=  G?’(B^^’^B^‘))~' 


(3.23) 


(3.24) 


(3.25a) 

(3.256) 


It  should  be  mentioned  here  that,  at  any  iteration  step  (i  +  1),  the  new  estimates  of  b^'^  are  used  in  forming  the 
matrix  W^'l.  Note  that  the  initial  estimate  b^°)  comes  from  the  equation  error  minimization  step  of  (3.15). 

The  firit  phase  alone  may  not  converge  to  the  absolute  optimum  of  b  that  minimizes  e(ap  completely  though 
our  experience  with  many  examples  does  indicate  that  the  first  phase  comes  quite  close  to  the  optimum.  In  some 
cases,  especially  when  the  deviations  of  the  measured  responses  from  the  desired  ones  are  relatively  large,  a  second 
phase  of  EFM  needs  to  be  invoked.  In  the  second  phase  of  iterations,  the  variation  of  W  w.r.t.  b  is  also  t.^ken 
into  account.  The  details  of  phase  2  for  the  scalar  case  may  be  found  in  [20].  The  extension  for  the  multi-channel 
case  is  similar  to  the  development  of  the  extension  for  the  first  phase  given  above. 
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At  convergence  of  the  second  phase,  the  optimum  value  b*  is  found.  Plugging  that  in  (3-21)  the  optimal 
error  vector  e(a*,b*)  is  computed.  The  optimal  impulse  response  hj  is  then  found  from  (3.9b)  as, 


h,  =  f,  -  e(a;,b*). 


(3.26) 


Finely,  the  optimal  &i  is  computed  from  (3.13a)  as 

a;  =  h{‘V,  (3.27) 

where  has  the  same  form  as  except  that  the  elements  /»i;(/fc)  are  replaced  by  the  corresponding  hj;(i). 
We  should  mention  here  that  the  separable  optimization  of  a  and  b,  as  given  here,  falls  within  a  special  class 
of  non-linear  optimization  problems  which  have  been  studied  extensively  by  numerical  analysts  [22].  It  has  been 
shown  in  [22]  that  if  some  of  the  unknown  variables  are  linearly  related  to  the  error  and  the  other  variables  are 
non-linearly  related  and  if  the  variables  do  separate  as  the  case  studied  here,  the  two  step  optimization  do  produce 
the  optimum  for  both  sets  of  variables. 


We  should  mentiou  here  that  the  separable  optimization  of  a  and  b,  as  presented  in  this  paper, 
falls  within  a  special  class  of  non-linear  optimization  problems  which  have  been  studied  exteiisively 
by  numerical  analysts  [22].  It  was  shown  by  Golub  and  Pereyera  in  [22]  that  if  some  of  the  unknown 
variables  are  linearly  related  to  the  error  and  the  other  variables  are  non-linearly  related  and  if  the 
variables  do  separate  as  the  case  studied  here,  then  the  two  level  optimization  of  the  kind  described 
in  the  preceding  sections  produces  the  optimum  values  for  both  sets  of  variables. 

4.  Simulation  Results 


A  (2  X  2)  transfer  function  matrix  was  used  for  simulations.  Table  4.1  contmns  the  coefficients 
of  the  denominator  polynomial  6(c)  and  Tables  4.2(a)  and  (b)  contain  the  coefficients  of  the 
numerator  polynomials. 

Table  4.1:  Coefficients  of  denominator  of  G(c) 


vmim 

Denominotor 

4k 

l.OOOOOOOOOOOOOOOe  +  00 

.t 

* 

-2.239200000000000e  +  00 

.4 

* 

1.682120780000000e  +  00 

-3 

A 

-4.67524536o940000c  -  01 

-2 

* 

5.995215o08524930c  -  02 

.1 

4k 

-3.06914821 1131338e  -  02 

•0 

Ar 

7.921660299005685f  -  03 

Table  4.2(a):  Coefficients  of  numerators  of  H(z) 

Coeff 

Ol2(*) 

2.310000000000000e  -  02 

-3.829980000000000€  -  02 

-6.4655160000000006  -  01 

Wm 

2.825354370300000e  -  02 

3.900814928800000e-01 

-1.163969877oll980c  -  02 

-1.1241974442902406  -  01 

Hfl 

2.305936357609712€  -  03 

1.5344721310690056  -  02 

■B 

-1.5992937951067996  -  04 

-7.808982566335611c  -  04 
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Table  4.2(b):  Coefficients  of  numerators  of  H(2) 


Coeff 

<*21  (s) 

022(2) 

-2.120000000000000c  -  01 

2.3l8000000000000e  -  01 

2.986020000000000e  -  01 

-2.80516l954172887e-  01 

-1.568783612400000e-01 

1.031386276967342e  -  01 

HI 

5.140274245157200c  -  02 

-3.466242652862272c  -  03 

Bl 

-8.602449814804975e  -  03 

-4.646062737792217c  -  03 

WSm 

4.959796217794915c -04 

6.208784880946757c -04 

The  ori^nal  system  is  of  6-th  order  and  has  2  inputs  and  2  outputs.  Using  the  algorithm 
developed  in  Section  4,  we  generated  transfer  function  matrices,  such  that  each  element  of  the 
estimated  transfer  function  matrix  had  orders  5, 4  and  3.  A  comparison  of  the  impulse  responses  of 
the  lower  order  approximation  with  the  actual  one  is  expressed  as  SNR  in  second  column  of  Table 
4.3.  Further,  in  Figures  1(a),  2(a),  3(a).  4{a)  and  5(a)  we  have  plotted  the  actual  unit  pulse 
response,  and  absolute  errors  corresponding  to  approximation  of  order  6.  5,  4  and  3  respectively. 
The  low  maagnitudes  of  the  errors  clearly  indicate  the  effectiveness  of  the  proposed  technique. 

The  impulse  response  of  the  transfer  function  matrix  was  then  corrupted  by  random  noise  such 
that  the  SNR  was  20.5  dB.  The  noisy  impulse  response  was  used  for  estimating  the  parameters  of 
the  transfer  function  matrix.  The  SNR  for  the  estimated  system  is  tabulated  in  the  third  column 
of  Table  4.3  below.  Note  that  in  computing  the  various  <i,j(r)’s  the  algorithm  uses  the  measured 
impulse  response  data.  Therefore,  it  tries  to  match  the  noisy  impulse  response  data.  To  make  a  fair 
evaluation  of  the  performance  of  the  algorithm,  the  SNR  for  the  noisy  case  is  computed  by  using 
the  traihng  elements  of  the  estimated  and  the  unit  pulse  response  of  the  original  transfer  function 
matrix.  In  Figures  1(b).  2(b),  3(b),  4(b)  and  5(b),  we  have  plotted  the  noisy  pulse  response  data 
and  respectively  the  6-th.  5-th.  4-th  and  the  3-rd  order  approximations. 


Table  4.3:  Simulation  Results 


Order 

SNR  (dB) 
Noiseless 

S.NR  (dB) 
Noisy 

6 

96.6577 

30.1450 

5 

32.9529 

4 

45.9135 

33.4319 

3 

31.5113 

30.7571 

Extensive  simulations,  with  various  SNR  values  show  close  approximation  to  noiseless  impulse 
response.  The  results  obtained  using  the  proposed  method  consistently  show  better  performance 
compared  to  the  existing  methods. 
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5.  Concluding  Remarks 

In  this  paper  we  addressed  the  problem  of  identification  of  discrete  time  transfer  function  matrices 
from  the  noisy  unit  pulse  response  data.  The  proposed  method  is  a  generalization  of  an  existing 
technique  that  ^timates  the  parameters  of  a  discrete  time  scalar  transfer  function.  The  simulation 
results  presented  here  and  extensive  experience  with  the  proposed  scheme  clearly  indicate  that  it 
can  be  reliably  used  for  estimating  the  parameters  of  discrete  time  transfer  function. 
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APPENDIX 

Computational  Algorithm  :  Phase  II 


In  this  appendix  the  second  phase  of  the  iterative  algorithm  is  described  in  detail.  In  this  phase,  the  derivative 
of  the  matrix  W;  w.r.i.  b  is  taken  into  account.  The  complete  error  expression  is  rewritten  below, 

||e,(a;.b)||i  =  ena;.b)e,(ar,b).  (A.l) 

By  setting  the  derivative  of  this  squared  norm  to  zero,  we  obtain  the  updated  at  the  (i+  l)-th  iteration  as, 

b('+‘)  =  -  (A.2) 

where  (suppressing  the  superscript  ^’l), 

U,  A  L?’W/  +  G,W/'W,,  (A.2a) 


I  A 

=  [a6(i) 


raw,...,  I  l^w,  1 

aw,  ,  ^  aw  ^ 

db(k)  =  a6(i)J’ 


(yl.2a) 

(A.26) 

(A.2c) 


dbiky 

-  w 


[aK/fc)J®  ® 


and  (A.2d) 


has  the  same  form  as  the  B  matrix  defined  in  (2.17)  but  filled  with  all  zeros  except  at  the  locations  where 
6(ifc)  appear.  For  example. 


(A.2e) 


1 

0 

...  O' 

0 

1 

...  0 

0 

0 

...  1 

0 

0 

...  0 

; 

0 

.0 

0 

0. 

Once  b^*+*'  is  found,  b^'+'l  can  be  found  from, 

b(‘+i>  = 


1 

b«+» 


-[U{‘>G,]-I[u|‘>]g, 


(A.3a) 

(A.36) 


This  minimization  phase  continues  until  b’"''*  ~  b’  is  reached  and  this  optimum  b*  vector  corresponds  to  a 
minimum  of  the  error  surface  of  ||e,(aj ,  b)]]^. 
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CHAPTER  3 

SPECTRUM  ESTIMATION  AND  RELATED  TOPICS 

SECTION  3.1  ;  A  NOVEL  CYCLIC  ALGOHJTHM  FOB.  MAXIMUM  LIKELIHOOD  FBJBQUENCY 

Estimation 

SUMMAB.Y 

An  algorithm  for  estimation  of  frequencies  of  narrow-band  sources  from  noisy  observation  data  is  presented  in 
this  Section.  For  Gaussianly  distributed  noise,  the  algorithm  produces  maximum  likelihood  estimates,  otherwise 
least-squares  estimates  are  obtained.  The  proposed  algorithm  is  iterative  and  at  each  step  of  iteration,  the 
optimization  is  w.r.i.  a  single  frequency  only  and  hence,  simple  hardware/software  (using  FFT,  e.g.)  is  sufficient 
for  implementation.  The  performance  of  the  algorithm  has  been  compared  with  the  theoretical  Cramer-Rao 
bounds. 

I.  Introduction  : 

Estimation  of  frequencies  from  data  composed  of  multiple  narrowband  signals  in  noise  is  one  of  the  oldest  as 
well  as  a  current  research  problem  that  is  of  great  interest  in  several  branches  of  science.  In  the  recent  years  several 
techniques  that  produce  optimal  estimates  have  been  developed.  Unfortunately,  the  optimal  techniques  are  based 
on  computation  intensive  nonlinear  optimization  procedures.  Hence  the  optimal  techniques,  though  theoretically 
sound,  have  seen  limited  practical  usage.  In  fact,  most  of  the  well-known  and  established  frequency  estimation 
algorithms  are  actually  suboptimal.  The  suboptimal  algorithms  are  popular  because  they  can  be  implemented 
relatively  inexpensively  and,  except  at  low  SNR,  they  perform  equally  effectively  [Tufts  and  Kumaresan,  1982]. 
But  if  one  is  interested  in  real-time  computation,  especially  at  very  high  S2uiipling  rate  needed  for  high  frequency 
applications,  both  the  optimal  as  well  as  the  suboptimal  techniques  would  require  rather  expensive  special-purpose 
hardware/software.  The  motivation  of  the  this  work  was  to  investigate  the  possibility  of  devising  an  algorithm 
that  would  rely  on  off-the-shelf  hardwMe/software  for  implementation  but  would  still  be  optimal  in  the  Maximum 
Likelihood  (ML)  sense. 

It  is  well  known  that  if  the  observation  data  is  composed  of  a  single  sinusoid  in  gaussianly  distributed  noise, 
the  peak  of  the  periodogram  corresponds  to  the  maximum  likelihood  (ML)  estimate  of  the  unknown  frequency 
[Palmer,  1974;  Rife  and  Boorstyn,  1974,  1976].  The  hardware  or  software  implementation  of  the  periodogram 
is  based  on  the  Fast  Fourier  Transform  (FFT)  [Cooley  and  Tukey,  1965]  which  is  a  highly  efficient  but  simple 
algorithm.  Because  of  this  simplicity,  the  periodogram  indeed  is  the  main  workhorse  for  most  practical  applications 
even  when  more  than  one  sinusoids  are  present.  In  case  of  multiple  sinusoids,  the  effectiveness  and  applicability 
of  periodogram  is  greatly  diminished  unless  the  unknown  frequencies  are  well  separated.  The  periodogram  peaks 
in  such  cases,  in  general,  do  not  correspond  to  the  ML  estimates.  In  fact,  if  the  sepetration  between  two  adjacent 
frequencies  is  less  than  the  FFT  bin  width,  a  plot  of  the  periodogram  would  only  show  one  merged  peak  instead 
of  two  distinct  ones. 

Overcoming  the  resolution  limit  of  the  periodogram  has  been  a  major  focus  of  research  over  the  past  decade. 
Periodogram  is  basically  a  brute-force  method  which  does  not  make  any  explicit  use  of  the  exponential  nature 
underlying  the  multiple  sinusoids  data.  In  contrast  to  that,  the  modern  high-resolution  techniques  exploit  the 
known  information  about  the  exponential  character  of  the  observed  data  and  assume  an  appropriate  model,  either 
implicitly  or  explicitly.  The  problem  then  is  converted  to  a  multidimensional  search  over  the  parameter  space 
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of  the  chosen  model.  In  some  cases  [Parthasarathy  and  Tufts,  1985;  Rife  and  Boorstyn,  1976],  the  unknown 
frequencies  may  themselves  constitute  the  parameter  search  space.  But  these  direct  methods  are  based  on  the 
majcimization  of  optimal  criterion  which  require  non-linear  optimization. 

One  of  the  major  objectives  in  developing  suboptimal  techniques  has  been  to  come  up  with  linear  solution  for 
this  essentially  non-linear  problem.  The  origin  of  development  in  this  direction  may  be  found  in  the  algorithm  due 
to  Prony  [1795].  Complex  exponentials  may  be  considered  to  be  the  roots  of  a  Linear  Predictor  (LP)  polynomial 
and  the  estimation  of  the  LP  coefficients  (or  equivalently,  the  parameters  of  rm  Auto-Regressive  [AR]  model)  is 
a  linear  problem  [Makhoul,  1975].  Realizing  this  unique  property  of  complex  exponentials,  enormous  research 
effort  has  concentrated  on  this  particular  idea  [see  Jackson  et  al,  1978;  Tufts  and  Kumaresan,  1982;  Lang  and 
McClellan,  1980;  Ulrych  and  Bishop,  1975;  among  others].  The  roots  of  the  estimated  LP  p'^lynomial  are  the 
estimates  of  frequencies.  It  should  also  be  noted  that  the  parallel  development  of  the  maximum  entropy  method 
[Burg,  1967],  which  is  based  on  a  completely  different  theoretical  viewpoint,  essentially  produces  exactly  same 
results  as  AR  modeling.  These  methods  have  been  shown  to  be  significantly  more  effective  when  the  corrupting 
influence  of  noise  in  the  observed  data  or  in  the  correlation  matrix  is  reduced  by  the  incorporation  of  Singular 
Value  Decomposition  (SVD)  or  Eigen-Decomposition  (ED)  as  the  case  may  be  [Tufts  and  Kumaresan,  1982;  Kay 
and  Shaw  [1988],  Kung  et.  al.  [1983]  and  others].  The  ED  of  the  correlation  matrix  and  the  SVD  of  the  data 
matrix  composed  of  the  multiple  frequencies  in  noise  data  possess  certain  signal/noise-subspace  orthogonality 
properties  which  were  also  effectively  exploited  by  many  researchers  for  frequency  estimation  [Pisarenko,  1972; 
Owsley,  1978;  Schmidt,  1979;  Bienvenue  and  Kopp,  1979;  Reddi,  1979;  Kumaresan  and  Tufts,  1983,  Kung  et.  al. 
[1983]  and  others].  More  recently,  a  structured  matrix  approximation  based  method  has  been  proposed  which 
essentially  reparameterizes  the  maximum  likelihood  criterion  to  depend  on  LP-type  coefficients  [Kumaresan  and 
Shaw,  1985,  1988;  Kumaresan,  Scharf  and  Shaw,  1986;  Shaw,  1987].  This  reparameterization  leads  to  im  iterative 
method  where  the  problem  is  inherently  made  linear  at  every  iteration  step.  As  one  would  expect,  this  ML 
algorithm  gives  accurate  frequency  estimates  even  at  low  SNR.  References  to  various  other  methods  of  frequency 
estimation  may  be  found  in  the  papers  cited  here. 

The  main  goal  of  the  present  work  is  to  integrate  the  positive  aspects  of  both  periodogram  as  well  as  the 
model  based  estimation  techniques.  In  this  work,  we  propose  an  iterative  algorithm  which  achieves  this  objective 
by  essentially  splitting  the  multidimensional  non-linear  optimization  problem  of  MLE  into  several  one-dimensional 
searches.  The  exact  model  of  the  multiple  complex  exponential  data  is  invoked  and  the  exact  ML  criterion  is 
optimized  w.r.t.  a  single  frequency  at  every  step  while  keeping  the  others  fixed  at  previously  estimated  values. 
It  may  be  emphasized  here  that  the  computational  simplicity  of  the  proposed  approach  comes  from  the  fact  that 
every  iteration  requires  optimization  with  respect  to  only  one  frequency.  This  can  be  easily  performed  by  finding 
the  peak  of  the  periodogram.  The  proposed  method  is  iterative.  It  requires  crude  prior  estimates  or  regions  of 
interest  of  the  frequencies.  The  initial  estimates  may  again  be  obtained  from  the  periodogram  peaks  or  any  other 
method.  In  fact,  our  simulations  indicate  the  effectiveness  of  the  algorithm  does  not  diminish  much,  even  if  the 
initial  estimates  are  chosen  in  a  completely  random  manner. 

This  Section  is  arranged  as  follows:  The  problem  is  formulated  in  Subsection  II  and  the  proposed  algorithm 
is  described  in  Subsection  III.  Simulation  results  are  given  in  Subsection  IV  and  finally  some  concluding  remarks 
have  been  included  in  Subsection  V. 

II.  Formulation  of  the  Problem  : 

Let  x(n),  n  =  0, 1, . . . ,  A  —  1,  be  an  observation  data  record  of  N  consecutive  samples  of  the  multiple  complex 
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exponential  signal  model  which  is  defined  as, 

p 

x(n)  A  ^  n  =  0, 1,. .  .,1V  _  1  where,  (IJ.l) 

*  =  1 

at  unknown  real  amplitude  of  the  ib**  sinusoid, 

<i>t  unknown  phase  of  the  sinusoid, 

wt  unknown  angular  frequency  of  the  sinusoid  and 

p  assumed  number  of  sinusoids. 

The  observation  samples  are  expressed  as, 

ir(n)  =  x(n)  +  z(n),  (77.2) 

where,  z(n)  represents  observation  noise  and/or  modeling  error.  In  vector  form,  the  observed  samples  x{n),  the 
model  scimples  i(n)  and  the  noise  samples  z(n),  for  n  =  0, 1, . . . ,  IV  —  1,  are  related  as, 

X  A  X  +  z  where,  (^1^-3) 


X  A  [i(0)  x(l)  ....  x(lV-l)]‘ 

X  A  [£(0)  i(l)  . . . ,  x{N  —  1)]*  and 
z  A  [z(0)  z(l)  ...,  2(1V-1)]‘, 


(77.4a) 

(77.46) 

(77.4c) 


where,  denotes  matrix  or  vector  transpose.  The  multiple  complex  sinusoids  model  vector  x  is  equivalently 
described  by  the  following  matrix-vector  decomposition. 


X  =  Ta  where, 


and  a  A 


(77.5) 


(77.6) 


where,  At  A  ate^*^ ,  for  Jfc  =  1,2, . .  .,p,  respectively,  are  the  complex  amplitudes.  The  problem  under  consid¬ 
eration  here  is  to  choose  or  estimate  the  best  model  parameters  Ai,A2,  ■  -  -  lAf,  and  wi,W2,  • .  .,Wp  such  that  the 
modeling  error  norm, 

||e||2  A  £;(T,a)  A  ||x  -  Ta\\l  (77.7) 

is  minimized. 

Next,  a  brief  derivation  is  given  to  show  that  the  least-squares  criterion  in  (II. 7)  is  indeed  the  one  required  for 
MLE  when  the  noise  samples  z(n)  are  white  and  gaussianly  distributed.  More  details  may  be  found  in  [Rife  surd 
Boorstyn,  1976].  If  the  observed  samples  x  in  (11.4a)  are  composed  of  multiple  sinusoids  in  Gaussianly  distributed, 
zero-mean  and  complex  white  noise,  the  probability  density  function  (PDF)  of  x  is  given  by. 


7r^det(R«) 


(77.8) 


where  x  is  defined  in  (II.4b)  and  is  the  N  x  N  autocorrelation  matrix  of  the  noise.  Since  the  noise  is  assumed 
to  be  white, 

R„  =  trll.  (77.9) 
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To  obtain  the  MLE  of  the  unknown  parameters  one  needs  to  optimize  the  following  criterion, 


max  V{x.  -  x)  =  max  „ 


1  -Ji(x-*)"(x-x) 

t 


This  criterion  is  exactly  equivalent  to 
Now,  writing  the  model  vector  x  explicitly  fron:  (II.5)  and  defining  the  fitting  error  as. 


min  (x  — x)^(x  — x). 


(//.lO) 

(//■ll) 


e(T,a)  A  X  -  Ta,  (i/.12) 

the  criterion  becomes 

.min  £:(T,a)  A  min  e"(T,a)e(T,a).  (77.13) 

This  criterion  is  exactly  same  as  the  least  squares  fitting  criterion  in  (II.7).  Note  that  the  error  e(T,  a)  in  (11.13) 
is  linearly  related  to  the  parameters  in  a  while  it  is  nonlinearly  related  to  the  frequencies  in  T.  Hence  the 
minimization  of  £(T,a)  is  a  non-linear  multidimensional  optimization  problem. 

Considerable  work  have  been  reported  on  the  direct  optimization  of  the  criterion  in  (II. 11).  Notable  among 
them  are  the  work  by  Golub  and  Pereyra  [1975],  Partbassirathy  and  Tufts  [1984]  and  Rife  and  Boorstyn  [1976]. 
These  approaches  are  mainly  based  on  Newton-Raphson  or  Gauss-Newton  type  algorithms  Performance  of  the 
general  non-linear  multidimensional  optimization  algorithms  depend  primarily  on  the  choice  of  initial  estimates. 
The  proposed  algorithm  is  described  next. 

III.  The  Maxim'im-Likelihood  Algorithm  t 

In  order  to  motivate  the  proposed  approach  let  us  consider  an  ideal  case  first.  To  facilitate  the  splitting  of 
the  multi-dimensional  optimization  criterion  in  (11.13)  into  several  1-D  optimization  problems,  we  rewrite  the 
Vandermonde  matrix  T  in  (II.6)  in  the  following  form. 


T  A  [ti  t2  ...tt...tp]‘  where,  (777.1) 

t*  A  [1  (777.2) 

Next,  a  Vandermonde  matrix  T*  is  formed  with  (p—  1)  of  the  p  frequencies  amd  excluding  the  t*  vector,  i.e., 

Tt  A  [ti  t2  tfc+i...tp]‘.  (777.3) 

Similarly,  define  the  corresuonding  amplitude  vector  as, 

a*  A  [oi  02  ot+j...Op]*.  (777.4) 

Using  the  above  notations,  the  model  vector  x  of  (II.5)  is  rewritten  as, 

X  =  Ta  =  Ttat  -1-  ttA*.  (777.5) 

Plugging  this  in  (11.12)  we  get, 

e(T,a)  =  X  -  Tta*-ttAt.  (777.6) 
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If  the  frequency  and  the  amplitude  terms  in  Tt  and  at  are  known  exactly,  the  error  can  be  rewritten  as, 


e(tt,^t)  =  xj*  -  ttAt,  {1 1 1. 7) 

where,  is  a  modified  data  vector  defined  as, 

xj*  A  X  -  T*at.  (III. 8) 

Clearly,  the  minimization  of  the  norm  of  the  error  in  (111.7),  i.e., 

min  llxj*  -  ttyltlll  (1 1 1. 9) 

is  a  one-dimensional  optimization  problem.  In  fact,  the  maximum-likelihood  estimate  of  the  unknown  frequency 
uic  in  tt  can  be  found  from  the  peak  location  of  periodogram  plot  of  the  modified  ’data’  vector  x™.  Furthermore, 
the  minimization  in  (III. 9)  can  be  carried  out  for  each  k  =  1,2, . .  .,p  and  the  estimates  of  all  the  p  unknown 
frequencies  can  be  found  by  performing  p  FFTs! 

The  approach  outlined  above  seems  perfect  except  that  it  presumes  the  ideal  case  of  exact  knowledge  of  the 
frequencies  which,  in  fact,  we  sought  to  evaluate  in  the  first  place.  In  practical  situations,  the  p  —  1  frequencies 
in  Tt  and  the  corresponding  amplitudes  in  &k  which  are  needed  to  form  the  x]^  in  (III.8)  will  not  be  known. 
Instead,  we  propose  to  replace  Tjt  and  at  by  the  corresponding  estimates.  Then  the  optimization  can  not  be 
accomplished  with  only  one  set  of  p  FFTs  as  anti  'ipated.  In  that  case,  the  optimization  procedure  would  have 
to  be  done  iteratively.  The  iterative  algorithm  is  outlined  next. 

Let  us  first  assume  that  approximate  initial  estimates  of  (p  —  1)  of  the  frequencies  and  corresponding  am¬ 
plitudes  are  available  (from  periodogram  or  linear  prediction  or  any  other  method)  and  that  the  k*^  of  the  p 
frequencies  is  unknown  or  needs  to  be  updated.  Now,  separating  the  known  and  unknown  parts  of  T,  we  can 
write  the  observation  vector  x  in  (II. 3)  as, 

X  A  Ttat  -h  ttAt  -(-  z 

A  x*  ttAt  -f  z  (7/7.10) 

where  T*  and  at  have  the  same  forms  as  in  (III.3)  and  (III.4)  except  that  the  exact  values  are  replaced  by  the 
corresponding  estimated  values  and  xt  A  Tt&t-  Now  defining  x*  A  x  —  xt,  (III. 10)  can  be  rewritten  as, 

xt  A  ttAt  +  z.  (777.11) 

We  now  treat  the  vector  x*  as  the  ’data’  vector  and  correspondingly  rewrite  the  error  criterion  for  the  Jb-th 
frequency  in  (III. 9)  as. 


min  ||et||^  A 

i^k.Ak  = 

A 


min  E{tt,At) 
min  ||xi  -  tiAtlll. 


(777.12) 


Once  again,  the  optimization  problem  in  (111.12)  is  with  respect  to  the  parameters  associated  with  a  single 
frequency  only.  Once  the  frequency  is  estimated,  the  corresponding  amplitude  may  be  obtained  by  the 
following  pseudo-inverse  solution, 


1  - 


At  = 


(777.13) 
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The  updated  estimates  of  ut  and  At  can  then  be  included  to  form  and  At^.x  similar  to  (III.3)  and  (III. 4). 
The  same  steps  may  then  be  followed  for  updating  the  k  + frequency  by  optimization  of  the  following  criterion, 

min  ||xt+i  -  (///.14) 

In  this  manner  at  each  updating  step  one  of  the  frequency  estimates  is  re-evaluated  while  keeping  the  other  p  —  1 
frequencies  fixed  at  previously  estimated  values.  At  every  iteration  level  the  updating  starts  with  A  =  1  and 
goes  up  to  A  =  p,  exhausting  all  the  frequency  components.  The  iterations  are  continued  till  no  further  changes 
are  observed  in  the  frequency  estimates.  Note  that  at  every  step  of  optimization  of  (III. 12),  the  ML  criterion  in 
(II. 7)  is  minimu'cd,  albeit  with  respect  to  one  frequency  at  a  time.  Hence,  at  convergence  the  ML  estimates  are 
obtained. 

The  attractiveness  of  the  proposed  algorithm  comes  from  its  sheer  simplicity  in  implementation.  The  algo¬ 
rithm  may  also  be  used  as  an  adaptive  frequency  tracker.  In  a  radar  or  sonar  environment,  the  spatial  frequencies 
which  are  related  to  the  angular  positions  of  far-field  sources  may  change  as  the  sources  move  [Kumaresan  and 
Shaw,  1987],  the  proposed  algorithm  will  be  able  to  adapt  to  such  changing  environment.  In  the  next  two 
paragraphs  we  consider  the  important  issues  of  model  order  selection  and  the  choice  of  the  initial  estimates. 

Model  Order  Selection  ;  In  the  above  discussion  of  the  algorithm,  we  have  assumed  that  the  model  order  p  is 
exactly  known.  In  practice,  one  may  not  have  that  knowledge  and  the  model  order  needs  to  be  estimated.  Two 
of  the  most  commonly  used  order  selection  criteria  are  the  Akaike  Information  Criterion  (AIC)  [Akaike,  1974] 
and  the  Minimum  Description  Length  (MDL)  criterion  [Rissanen,  1978,  1983].  Both  these  criterions  are  perfectly 
suited  for  the  proposed  approach  because  both  AIC  and  MDL  criterions  are  computed  using  the  logarithm  of  the 
minimized  error  norm  in  (II. 7)  along  with  the  number  of  parameters  under  optimization.  MDL  also  requires  the 
observation  data  length  N. 

Choice  of  Initial  Estimates  of  frequencies  :  Any  estimation  technique  such  as  linear  prediction  or  coarse  pe- 
riodogram  peaks  may  be  used  to  get  the  initial  estimates.  In  our  simulations  we  chose  the  periodogram  peak 
locations  as  the  initial  estimates  of  frequencies.  If  there  is  a  single  merged  perdc  indicating  the  possibility  of  more 
than  one  frequency  in  that  region,  one  may  choose  the  peak  locations  as  the  possible  frequencies  and  the  other/s 
may  be  chosen  within  the  peak  lobe  width  according  to  the  number  of  sinusoids  present.  But  the  algorithm 
seemed  to  be  highly  robust  in  the  sense  that  in  our  simulations  the  perform2uice  did  not  deteriorate  even  for 
random  choice  of  initial  estimates. 

IV  :  Simulation  Results 

The  algorithm  described  above  has  been  tested  on  simulated  data.  The  following  formula  is  used  to  generate 
the  data. 


i(n)  =  Qie^"*  -1-  026^"’"  -1-  z(n)  (7V.1) 

n  =  0,  1,  ...,24 

where  wj  =  25r/i,  W2  =  2x/2,  f\  and  fi  being  0.23  and  0.26,  respectively,  oi  =  02  =  1  and  2(n)  is  a  computer 
generated  white  complex  gaussianly  distributed  noise  sequence  with  variance  2<t^.  is  the  variance  of  the  real 
and  the  imaginary  parts  of  z(n).  SNR  is  defined  as  10  logioQ  /2(T^).  One  hundred  sets  of  samples  with 
different  noise  epochs  were  used. 

For  the  observation  records  described  above  the  error  criterion  given  in  (II. 3)  was  minimized  by  following 
the  algorithm  described  in  Subsection  III.  The  periodogram  peak  location  with  a  64-point  FFT  was  used  to  find 
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the  coarse  initial  estimate  to  form  ti  and  from  that  Ai.  In  the  figure  below,  10  log  calculated  with  the  bias 
and  variance  for  100  trials  at  different  SNR  values  are  plotted.  Cramer-Rao  bounds  (CRB),  which  give  the  lower 
bound  on  the  variance  of  fi  (or  fi)  for  the  corresponding  SNR  values,  are  also  plotted.  The  results  indicate  that 
the  bias  in  the  estimates  is  negligible  and  the  Mean  Squared  Errors  remain  close  to  the  CRB  up  to  about  OdB 
SNR.  The  iterative  procedure  always  converged  at  every  trial  in  5  to  15  iterations.  The  present  algorithm  seems 
to  push  the  SNR  threshold  even  lower  than  reported  results.  More  studies  and  comparisons  with  other  algorithms 
need  too  done.  In  order  to  test  the  robustness  and  sensitivity  of  the  algorithm  to  initial  estimates,  we  also  ran 
simulations  with  initial  frequency  estimates  chosen  completely  randomly  from  a  uniform  noise  generator.  It  was 
found  that  the  algorithm  performed  almost  as  well  as  in  the  previous  experiment.  This  robustness  aspect  of  the 
algorithm  should  be  studied  more  in  future. 

V  :  Conclusion 

A  new  algorithm  for  Maximum-Likelihood  frequency  estimation  is  presented.  Unlike  all  known  ML  methods 
for  frequency  estimation,  the  proposed  algorithm  does  not  require  any  multidimensional  optimization.  The 
multidimensional  problem  is  split  into  several  1-D  optimization  problems.  For  computation,  the  algorithm  only 
requires  the  FFT  algorithm  which  is  extensively  used  in  signal  processing.  Our  simulations  indicate  that  the 
algorithm  pushed  the  threshold  down  to  OdB  SNR.  The  iterative  algorithm  also  showed  remarkable  robustness 
to  the  choice  of  initial  estimates.  We  intend  to  extend  the  algorithm  to  the  more  general  case  of  frequency- 
wavenumber  estimation  and  also  for  Toeplitz  matrix  approximation. 
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section  3.2  :  A  Parameter  adaptive  Simulated  annealing  Scheme  for  Frequency 
Estimation 

Summary 

A  Simulated  Annealing  scheme  based  on  parameter  adaptive  cooling  schedule  is  proposed.  In  the  existing 
annealing  schemes,  the  temperature  parameter  is  predetermined  for  every  iteration  step  and  is  independent  of 
the  unknown  parameter  values.  In  the  proposed  scheme,  the  cooling  temperature  is  made  proportional  to  the 
deviation  of  each  individual  parameter  at  the  earlier  iteration  step.  The  other  key  difference  in  the  proposed 
scheme  is  that  it  never  accepts  a  higher  energy  level  and  remains  at  the  present  lower  energy  position.  Instead, 
the  Boltzmann  Distribution  is  used  to  accept  a  larger  cooling  temperature  t.e,  a  broader  parameter  search  space. 
The  algorithm  is  then  applied  to  the  well  known  nonlinear  optimization  problem  of  frequency/angles  of  arrival 
estimation  of  multiple  sources.  Simulation  results  indicate  that  the  proposed  scheme  converges  to  the  minimum 
energy  level  in  fewer  iteration  steps  when  compared  to  an  existing  fast  annealing  algorithm. 


Introduction 

Multidimensional  and  non-linear  optimization  problems  occur  in  engineering,  economics,  geophysics,  as  well 
as  in  almost  all  fields  of  science.  A  plethora  of  literature  on  standard  optimization  algorithms  are  available  in  the 
literature  but  in  this  decade  a  new  and  powerful  optimization  algorithm  called  Simulated  Annealing  has  emerged 
and  has  found  ever  increasing  attention  in  many  applications.  Although  simulated  annealing  was  originally 
proposed  by  Metropolis  in  1953,  only  recently  it  has  found  successful  applications  in  constrained  and  unconstrained 
optimization  problems  [Kirkpatrick  et  al,  1983;  Gelfand  and  Mitter,  1985;  El  Gamalet.  al,  1987,  Sharman,  1988]. 
Also  very  recently,  Szu  [1986]  has  developed  a  fast  annesding  scheme  which  has  improved  convergence  rates.  It  is 
well  known  that  the  performance  of  almost  all  other  existing  nonlinear  optimization  techniques  are  highly  sensitive 
to  the  initial  estimates  and  most  algorithms  cannot  come  out  of  a  locatl  optimum  if  it  happens  to  reach  one  during 
the  iterative  process.  Simulated  annealing  is  a  form  of  stochastic  optimization  and  two  of  its  most  important  and 
exciting  features  are  that  it  can  escape  local  stationary  points  of  a  cost  function  and  that,  ‘theoretically’,  it  is 
guaranteed  to  reach  the  global  optimum. 

Although  the  presently  available  annealing  algorithms  are  quite  attractive  it  is  generally  felt  that  even  the 
fast  annealing  technique  by  Szu  requires  considerable  computations.  This  is  mainly  due  to  the  large  number  of 
iteration  steps  involved.  A  further  reduction  in  the  convergence  rate  will  have  far  reaching  advantages  in  many 
applications.  The  major  goal  of  the  proposed  research  is  to  explore  a  possible  annealing  scheme  which,  according 
to  preliminary  studies,  seems  to  provide  faster  convergence  than  existing  techniques. 

Simulated  Annealing  and  the  Globally  Optimal  Design 

The  idea  of  simulated  annealing  may  be  understood  in  the  following  manner.  Assume  that  we  have  a  box 
containing  an  unknown  3-dimensional  terrain  with  valleys  and  peaks  at  unknown  locations  and  we  are  interested 
in  finding  the  deepest  point  in  the  terrain.  A  simple  method  would  be  to  take  a  ball  and  drop  it  inside  the 
box  containing  the  terrain  because  the  ball  will  eventually  roll  down  and  reach  the  deepest  point  in  the  valley 
in  which  it  was  dropped.  Of  course,  if  the  ball  is  dropped  in  another  location  which  may  be  another  valley,  it 
would  roll  down  to  the  deepest  point  of  that  particular  valley.  Unfortunately,  none  of  these  two  deep  points  of 
the  individual  valleys  may  be  the  deepest  point  of  the  whole  terrain  which  is  what  we  were  seeking  in  the  first 
place.  This  is  exactly  how  the  conventional  gradient  based  optimization  algorithms  work.  The  terrain  surface 
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may  represent  the  multimodal  surface  of  a  cost  criterion  C{0)  one  is  trying  to  minimize  where,  0  is  a  vector 
containing  the  underlying  unknown  parameters.  To  reach  a  minimum,  one  supplies  an  arbitrary  initial  estimate- 
The  conventional  optimization  algorithms,  which  are  munly  gradient  based,  would  take  one  to  the  closest  local 
minimum  by  going  in  the  direction  of  decreasing  gradient  (t.e,  decreasing  cost)  until  zero  gradient  is  reached.  But 
again,  if  the  cost  function  has  multiple  local  minima,  one  may  not  have  reached  the  global  minimum  of  C{B).  The 
whole  thing  really  boils  down  to  the  choice  of  the  initial  estimate.  A  stiaight-forwaid  solution  for  this  problem 
would  be  to  compute  the  cost  criterion  at  all  possible  values  of  the  underlying  parameters  and  choose  the  set 
which  corresponds  to  the  lowest  cost.  Unfortunately,  this  simple  solution  will  be  prohibitively  expensive  even  for 
a  small  number  of  parameters  with  infinite  possibilities  for  each.  But  consider  the  ball-in-the-terrain-box  scenario 
again  and  assume  that  the  ball  is  resting  at  one  of  the  local  minimum.  Now,  if  one  can  somehow  agitate  the 
terrain  box  and  do  it  vigorously  enough  such  that  there  is  a  positive  probability  that  the  ball  will  be  dislocated 
from  its  resting  position  and  will  jump  to  a  position  with  higher  cost  but  in  another  valley  of  the  terrain.  Then  one 
can  start  the  search  process  all  over  again.  In  this  manner,  with  a  large  enough  number  of  attempts,  one  would 
ultimately  reach  the  valley  containing  the  global  optimum.  The  problem  then  is  to  seek  the  global  optimum  and 
remain  there.  This,  indeed,  is  the  fundamental  premise  of  simulated  annealing,  t.e,  it  can  tacapt  a  locsd  minimum 
of  a  multimodal  cost  criterion  and  can  ultimately  reach  the  global  optimum. 

From  the  description  given  above,  it  is  obvious  that  one  has  to  be  very  careful  in  how  much  one  should 
agitate  the  box  when  the  ball  is  resting  at  one  of  the  deeper  points.  If  one  shakes  the  box  too  much,  the  ball 
may  even  jump  from  the  globally  deepest  point  and  if  one  shakes  the  box  too  slow,  the  ball  may  remain  trapped 
forever  inside  a  valley  with  a  local  optimum.  Both  of  these  would  be  undesirable  and  hence,  one  needs  a  proper 
strategy.  In  true  annealing,  a  heated  solid  is  allowed  to  cool  to  its  minimum  energy  state.  The  molecules  use 
random  motion  to  search  for  new  positions  of  lower  energy.  The  likelihood  of  reaching  the  least  energy  state 
depends  on  how  fast  the  solid  is  cooled.  To  apply  this  analogy  to  the  cost  minimization  problem,  it  is  necessary 
to  meet  the  following  four  objectives  [Szu,  1986],  namely, 

(i)  A  cost  criterion,  C{6)  which  we  seek  to  optimize,  where  6  ^[6^  6^ ..  .0*’]*  is  the  vector  containing  the  p 
unknown  parameters.  denotes  the  matrix  or  vector  tramspose  operation. 

(ii)  A  rule  for  generating  new  candidate  parameter  estimates  ;  The  fast  annealing  scheme  employs  a  Cauchy 
generating  density,  which  for  N  parameters  in  6  may  be  expressed  as, 

where,  the  parameter  c  is  the  temperature  parameter  Tc{i)  which  is  found  directly  from  the  cooling  schedule 
introduced  next. 

(iii)  A  gradual  cooling  schedule  Te(t) :  It  has  been  shown  by  [Szu,  1986]  that  the  necessary  and  sufficient  condition 
for  convergence  of  the  fast  annealing  algorithm  to  the  global  optimum  requires  the  cooling  schedule  to  be  no 
faster  than  the  following  inverse  time  law  : 


(1) 


Tc{t)  ^  1 

To  1  +  I  ’ 


(2) 


where.  To  ^  0  is  a  sufficiently  high  initial  temperature. 

(iv)  A  hill-climbing  (cost-increasing)  acceptance  probability  ;  This  is  given  by  the  following  Boltzmann  distribu¬ 
tion  , 

p( accepting  higher  cost)  A  po  =  - ■  (3) 

—  1  -f.  c*bK(i) 
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where,  ka  is  the  Boltzmann  constant  and 

AC  A  Ctie)  -  (4) 

From  these  objectives,  it  is  clear  that  the  requirements  (tt)  —  (tv)  are  of  generic  nature  and  do  not  change 
for  different  problems,  whereas  for  (t),  the  cost  function  will  have  to  be  decided  specifically  for  each  optimization 
problem  under  consideration.  Next,  we  briefly  motivate  a  possible  new  annealing  scheme  which  seems  to  provide 
faster  convergence  than  existing  techniques. 

Motivation  for  the  Proposed  Annealing  Scheme 

The  main  motivation  of  the  development  of  the  simulated  annealing  algorithms  has  been  to  achieve  conver¬ 
gence  at  the  global  optimum  of  a  cost  criterion  which  is  nonlinearly  related  to  the  underlying  unknown  parameters. 
The  existing  algorithms  by  Metropolis  et  al  and  Szu  do  achieve  this  goal  but  even  the  faster  scheme,  due  to  Szu, 
may  have  very  slow  convergence  for  many  applications.  This  drawback  of  FSA  may  be  due  to  the  rigid  cooling 
schedule  (2)  used  by  the  annealing  scheme.  Experimentations  with  the  FSA  indicate  that  the  convergence  rate 
depends  on  the  choice  of  the  inibiJ  ‘-'^perature  Tq.  For  example,  if  the  parameter  values  in  0  at  the  optimum 
point  happen  to  be  relatively  small  compared  to  To,  the  cooling  may  not  have  any  major  effect  until  the  tem¬ 
perature  becomes  low  and  comparable  to  the  parameter  values.  We  feel  that  it  may  be  helpful  to  make  the 
cooling  schedule  depend  adaptively  to  the  parameter  values.  Also,  the  cooling  schedule  of  FSA  does  not  provide 
any  possibility  for  reheating  which,  we  think,  may  be  advantageous  in  the  search  for  the  optimum.  Especially, 
reheating  may  be  useful  when  the  temperature  has  become  very  low  but  the  algorithm  has  not  reached  close  to  the 
optimum  point.  We  also  feel  that  constant  temperature  search  may  also  facilitate  the  annealing  procedure.  Our 
preliminary  experiments  also  indicate  a  potential  problem  with  FSA  in  that,  if  the  annealing  iterations  happen 
to  teach  the  globally  optimum  point  at  a  relatively  high  temperature,  there  is  a  large  probability  that  it  may 
leave  the  optimum  point.  This  is  due  to  the  built  in  cost-increasing  acceptance  probability  inherent  in  the  FSA 
scheme.  This  seems  to  be  a  necessairy  evil  because  it  is  the  cost-increasing  acceptance  probability  which  really 
provides  FSA  the  mechanism  to  escape  from  local  optimum  points.  But  the  problem  is  that  the  algorithm  has  no 
way  of  knowing  if  it  is  escaping  a  local  or  the  global  optimum  point.  Theoretically,  of  course,  the  algorithm  will 
eventually  come  back  to  the  globally  optimum  point  but  that  may  be  after  a  very  large  number  of  iterations.  We 
feel,  one  way  to  avoid  this  may  be  to  stay  fixed  at  the  point  of  the  lowest  available  cost  until  another  point  with 
lower  cost  is  found.  The  mechanism  to  escape  from  a  local  optimum  point  can  be  incorporated  in  the  algorithm 
by  constant  and  high  temperature  search  and  also  by  expanding  the  search  space  according  to  the  acceptance 
probability. 

Keeping  in  mind  the  drawbacks  of  the  existing  simulated  annealing  schemes  and  the  possible  solutions 
as  outlined  above,  we  propose  here  a  new  simulated  annealing  scheme  so  as  to  reduce  the  convergence  rate. 
The  temperature  parameter  Tc{k)  plays  a  pivotal  role  in  our  annealing  scheme.  At  every  iteration  the  Cauchy 
parameter  generating  density  in  (1)  uses  Tc{k)  as  the  parameter  c  to  generate  the  random  deviations  in  parameter 
values.  But  c  is  the  semi-interquartile  of  the  Cauchy  density  function  which  means  that  there  is  a  50%  percent 
probability  that  the  random  deviations  generated  by  the  Cauchy  density  will  fall  within  ±c.  It  seems  logical  to 
expect  that,  instead  of  rigidly  pre-specifying  the  cooling  schedule,  one  may  be  able  to  hasten  the  search  process 
by  adaptively  selecting  Tc{k)  according  to  the  parameter  values.  Also,  the  cost  function  is  optimized  w.r.t. 
several  parameters  and  at  the  point  of  lowest  energy,  these  parameters  may  have  values  with  different  orders  of 
magnitude.  In  such  cases,  it  may  be  advantageous  to  make  the  semi-interquartile  (i.e,  the  temperature  Te{k))  for 
each  parameter  dependent  on  the  change  in  individual  parameters.  With  these  observations  in  mind  we  propose 
to  study  a  possible  faster  adaptive  simulated  annealing  scheme  outlined  in  the  next  Subsection. 


» 
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A  Faster  Simulated  Annealing  Scheme  Based  on  Adaptive  Cooling  Schedule 
The  major  steps  of  the  proposed  annealing  algorithm  are  summarized  below  : 

Initialize  the  parameter  set  6  with  arbitrary  values  and  compute  the  cost  function  C{0).  Initialize  the 
temperatures,  i.e,  the  semi-interquartile  parameters  7^  for  each  parameter  9*  at  sufficiently  large  values. 
Initieilize  iteration  no.  k  =  Q. 

Set  k  =  k  +  1.  Stochastically  generate  a  new  set  of  parauneters  followix^g  the  Cauchy  state-transition 
probability  defined  in  (1)  but  with  different  semi-interquartile  values.  Initialize  the  parameter  number  i  =  0. 

Set  t  =  t  4-  1.  Compute  the  cost  function  C*(0)  using  the  new  value  of  the  parameter  9*  and  the  values  of 
the  other  parameters  set  at  the  previous  iteration  step.  Then  calculate  the  difference  in  cost 

AC*  A  Ci(9)  -  Ci-'(9)  (5) 

Note  that  if  i  =  1,  the  last  term  will  be  C'f_,(0). 

If  AC  is  negative,  i.e,  if  the  new  cost  is  lower  than  the  one  in  the  previous  iteration  step,  accept  the  new 
parameter  set  and  the  new  lower  cost  and  set  the  temperature  at  7^(k)  =  A9‘  A  |^*(4)  —  9‘(k  —  1)|.  I: 
i  =  p  go  to  step  2,  else  go  to  step  3. 

If  AC  is  positive,  the  cost  is  fixed  at  the  previous  (lower)  value,  i.e,  but  accept  a  new  temperature 

(semi-interquartile)  7^(k)  =  mA9‘  with  a  probability  determined  by  the  Boltzmann  distribution  given  by 
(3)  using  the  temperature  7^(k)  and  the  cost  change  in  (5).  m  >  2  is  a  constant  that  expands  the  search 
space.  If  i  =  p  go  to  step  2  else,  go  to  step  3.  If  all  the  AC’s  remain  same  for  a  “long”  time  then  STOP. 

The  new  adaptive  simulated  algorithm  as  outlined  above  avoids  many  of  the  problems  associated  with 
the  existing  simulated  annealing  schemes.  For  example,  the  temperature  parameter  is  made  dependent  on  the 
difference  in  the  parameter  values  and  hence  they  are  of  the  order  of  the  parameters  themselves.  The  minimum 
energy  level  parameters  are  kept  unchanged  so  that  there  will  not  be  any  possibility  of  escape  from  the  optimum 
minimum.  Also  an  escape  mechanism  from  local  minimum  is  provided  by  expanding  the  search  space  according 
to  the  Boltzmann’s  probability  when  the  cost  is  higher  than  that  in  the  previous  step. 

To  demonstrate  the  effectiveness  of  the  proposed  algorithm,  we  applied  it  to  the  problem  of  locating  the 
optimum  point  of  the  cost  function  C(9)  =  9*  —  169^  4-  59  [Szu,  1986].  The  results  of  the  simulations  were 
quite  encouraging  and  are  discussed  later.  The  results  clearly  indicate  that  the  proposed  algorithm  converges 
faster  to  the  optimum  point  than  the  existing  technique  due  to  Szu.  Next  we  apply  the  algorithm  to  one  of  the 
well  known  optimization  problems  in  signal  processing,  namely,  frequency  or  angles  of  arrival  estimation.  Earlier, 
Sharman  [1988]  used  Szu’s  technique  to  the  frequency  estimation  problem.  Here  we  compare  the  results  of  the 
two  algorithms  with  a  number  of  simulation  studies.  In  the  next  Subsection  the  frequency  estimation  problem 
and  the  appropriate  cost  criterion  are  formulated. 

The  Frequency  Estimation  Problem 

Estimation  of  frequencies  from  data  composed  of  multiple  narrowband  signals  in  noise  is  one  of  the  oldest 
problems  studied  in  several  branches  of  science.  To  define  the  problem,  let  i(n),  n  =  0, 1, . . . ,  TV  —  1,  be  a  data 
record  of  N  consecutive  samples.  The  multiple  complex  exponential  signal  model  is  defined  as 

p 

i(n)  A  Yi  n  =  0, 1, . . .,  TV  -  1  (6) 

~  *=i 


1  : 

2  : 

3  ; 

4(a)  : 

4(b); 


where 
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Ok  unknown  amplitude  of  the  k'*  sinusoid, 

^jb  unknown  phase  of  the  sinusoid, 

ut  unknown  angular  frequency  of  the  sinusoid  and 

p  assumed  number  of  sinusoids. 

Note  that  for  the  angle  of  arrival  estimation  problem  N  denotes  the  number  of  sensors.  The  sensors  are  assumed 
to  be  separated  by  half  the  wavelengths  of  the  incident  waves  and  hence  =  ir  sin  0k ,  where  0k  denotes  the 
angle  of  the  ktb  source  relative  to  the  array  normal. 

The  observation  samples  are  expressed  as 

i(n)  =  i(n)  +  z(n)  (7) 


where,  z(n)  represents  observation  noise  and/or  modeling  error.  In  vector  form,  the  observed  samples  z(n),  the 
model  samples  z(n)  and  the  modeling  error  samples  z(n),  for  n  =  0, 1, . . . ,  N  —  1,  are  related  as 


X  A 

where, 

X  A  [i(0)  x(l) 
X  A  [£(0)  £(1) 
z  A  [r(0)  z(l) 


-f  Z 

(8) 

...  *(^-i)r 

(9a) 

...  i(N-l)Y 

(96) 

...  zfJV-l)]‘. 

(9c) 

The  multiple  complex  sinusoids  model  vector  x  is  equivalently  described  by  the  following  matrix-vector  decom¬ 
position, 

X  =  Ta  (10) 


where 


/  1 

1 

1  \ 

/Ai\ 

ejw, 

e^'-'r 

Aj 

T  A 

ejwp(JV-i)  y 

and  a  A 

\Ap/ 

(11) 


where,  Ak  A  for  k  =  1,2, ...,p,  respectively,  are  the  complex  amplitudes.  The  problem  under 

consideration  here  is  how  to  choose  or  estimate  the  best  model  parameters  Ai,A2 . Ap,  and  b;i,U2, •  ■  ■ 

such  that  the  following  modeling  error  norm 


||e||2  A  ^(T,a)  A  ||x  -  Ta||i 


(12) 


is  minimized. 

One  may  use  this  error  criterion  as  the  energy  function  and  minimize  it  with  respect  to  the  unknown  am¬ 
plitudes,  phases  and  the  frequencies.  In  most  cases,  one  is  mainly  interested  in  the  unknown  frequencies  (or  the 
angles  of  arrival).  Also,  the  unknown  vector  a  is  linearly  related  the  error  vector  e  and  can  be  eliminated  from 
the  error  criterion.  It  can  be  shown  [Shaw,  1987,  Kumares2ui  et  ai,  1986]  that  the  frequencies  can  be  directly 
obtained  by  minimizing  the  following  alternate  error  criterion  : 


EiT)  A  11(1  -  Pt)x||| 


(13) 
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where,  Pt  is  the  projection  matrix  and  is  defined  as  follows  : 

Pt  a  t(T"T)-^T"  (14) 

where  denotes  complex  conjugated  matrix  transpose  operation.  The  optimization  problem  described  by 
equation  (13)  is  a  highly  non-linear  multi-dimensional  optimization  problem.  Regarding  the  presently  available 
methods,  the  simpler  of  the  existing  algorithms  are  usually  suboptimal  and  the  optimal  techniques,  being  depen¬ 
dent  on  the  initial  conditions,  are  not  guaranteed  to  attun  the  global  optimum  and  are  usually  computationally 
expensive.  We  strongly  feel  that  simulated  annealing  offers  an  exciting  alternative  to  solve  this  optimization 
problem.  Extensive  simulation  studies  have  been  done  with  both  the  annealing  schemes  outlined  above  using  the 
following  cost  criterion  ; 

C{0)  A  11(1  -  Pt)x|1|  (15) 

where  0  A  [ui  U2  •  -  ^p]*-  The  simulation  results  are  summarized  in  the  next  Subsection. 

Simulation  Results 

The  proposed  simulated  annealing  algorithm  has  been  evaluated  by  extensive  simulations  on  a  variety  of  data 
sets  and  we  report  the  results  of  the  three  experiments  here.  In  the  last  two  experiments  the  cost  function  of  the 
form  given  in  (15)  is  minimized  and  gaussianly  distributed  white  noise  are  used. 

Experiment  1: 

In  this  experiment  we  considered  a  simple  example  (Szu,  1986]  to  demonstrate  how  the  proposed  technique 
outperforms  the  existing  annealing  methods.  In  this  example  the  cost  function  C(x)  =  —  16ar^  +  5x  has  one 

local  minima  and  a  global  minima.  Several  trials  of  annealing  runs  were  made  by  both  the  methods  and  the 
results  are  tabulated  in  Table  I.  In  both  the  methods  the  iterations  were  continued  till  the  globed  minima  is 
reached.  In  the  proposed  technique,  once  the  global  minima  is  reached  it  continues  to  stay  there  for  ever  whereas 
in  Sharmau’s  [4]  method  it  would  remain  in  the  global  minimum  only  when  the  temperature  becomes  sufficiently 
low. 

Experiment  2: 

This  experiment  involves  estimating  the  frequencies  of  two  narrowband  signals  in  noise.  The  data  record 
consists  of  25  consecutive  samples  generated  from  two  sources  of  closely  spaced  frequencies  of  0.5  and  0  52  Hertz. 
The  SNR’s  of  both  the  sources  were  10  dB’s  The  annealing  technique  is  applied  to  a  single  set  of  data  and  the 
results  are  compared  with  that  of  Sharman  [4]. 

Figures  la  and  lb  show  a  typical  annealing  run  by  the  proposed  method  and  that  of  Sharman’s  [4].  Both 
the  runs  were  started  with  identical  initial  conditions.  The  temperature  was  initiedized  to  1000.  The  graphs  show 
the  trajectories  of  the  source  frequencies  as  the  annealing  proceeds.  The  values  of  the  cost  function  after  3000 
iterations  were  2.305  in  the  proposed  method  and  2.445  in  Sbarman’s  [4]  method. 

Experiment  3: 

In  this  experiment  there  are  three  point  sources  illuminating  an  array  of  8  sensors  at  directions  of  10,  25  and 
35  degrees  from  the  array  normal.  The  signal  strengths  over  the  additive  white  noise  were  10,  15  and  12  dB’s. 
This  experiment  was  taken  from  Shauman[4].  The  temperature  was  initialized  to  1000.  Figures  2a  and  2b  show 
the  3000  annealing  iterations  by  the  proposed  technique  and  Sbarman’s  [4]  method,  respectively,  with  identical 
initial  conditions.  In  all  our  trials  of  annealing  iteration  the  global  minima  was  reached  within  few  hundreds  of 
iteration  in  the  proposed  method  whereas  in  Sbarman’s  [4]  method  the  estimates  were  varying  erratically  even 
after  few  thousands  of  iterations. 


The  key  difference  between  the  proposed  method  and  th^  existing  method  is  that  the  cost  function 
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Monol..««.ly  decrease*  as  the  iteration  pr,»eda  whereaa  in  Sharman'a  (4)  nrethod  the  coat  function  ™riea 

erratically  aa  long  as  the  temperature  is  high. 
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SECTION  3.3  ;  ONE-STEP  ESTIMATION  OF  ANGLES  OF  AaBJVALS  OF  WIDEBAND  SIGNALS 
Summary 

A  high  resolution  algorithm  for  estimating  the  angles  of  arrivals  of  multiple  wideband  sources  is  studied  for 
this  part  of  the  work.  The  algorithm  is  effective  for  a  dense  and  equally  spaced  array  structure  where  a  bilinear 
transformation  is  utilized  in  the  frequency  domain  for  combining  the  signal  subspaces  at  different  frequencies  for 
coherent  processing.  When  compared  with  existing  coherent  approaches,  the  algorithm  is  non-iterative  in  the 
sense  that  all  the  arrival  angles  can  be  estimated  in  only  one  step  of  the  algorithm.  Existing  algorithms  can 
only  estimate  the  angles  of  a  cluster  of  sources  in  a  particular  direction.  The  proposed  algorithm,  imlike  the 
existing  ones,  does  not  need  the  knowledge  of  the  initial  estimates  of  the  arrival  angles.  The  work  reported  here 
is  a  variation  of  some  eatrlier  work  by  the  author  [32].  Instead  of  using  generalized  eigendecomposition  or  matrix- 
pencil  method,  here  we  pre-  or  post-multiply  the  signal-subspace  matrix  with  the  noise  matrix.  This  enables  us  to 
use  regular  eigendecomposition  routine  to  estimate  the  source  angles.  It  is  also  shown  that  it  may  be  numerically 
more  stable  if  the  coherent  combination  is  not  focused  in  the  center  frequency  the  numerical  value  of  which  could 
be  very  large.  The  new  focusing  matrix  given  here  allows  to  focus  independent  of  the  center  frequency.  The 
performance  of  the  algorithm  is  presented  using  simulated  data. 

I.  Introduction  :  Estimation  of  locations  and  characteristics  of  radiating  sources  using  data  collected  at  the 
output  of  an  array  of  sensors  is  a  frequently  researched  problem  in  signal  processing  [31].  This  problem  has  many 
ramifications  and  different  kinds  of  categorization  are  possible  depending  on  the  assumptions  and  the  knowledge 
or  otherwise  of  the  source  and  noise  characteristics,  array  geometry,  etc.  One  broad  categorization  is  based  on  the 
radiating  sources  having  narrowband  or  broadband  spectra.  A  plethora  of  publications  are  available  in  literature 
and  in  the  next  two  paragraphs  some  of  the  approaches  ate  briefly  outlmed. 

la.  Narrowband  Sources  ;  Estimation  of  frequencies  of  sinusoidal  signals  in  noise  and  estimation  of  angles  of 
arrivals  of  planewaves  with  narrowband  signals  are  related  problems.  Various  existing  techniques  are  primarily 
based  on  Fourier  methods  [1,2],  eigendecomposition  and  singular  value  decomposition  based  methods  [3-10, 2unong 
others],  and  maximum  likelihood  based  methods  [11-13].  Some  of  these  techniques  have  also  been  extended  to 
the  2-D  problem  of  simultaneous  estimation  of  frequencies  2uid  wavenumbers  [14-17]. 

lb.  Broadband  Sources  ;  The  broadband  problem  is  radically  different  from  the  narrowband  problem  in  many 
respects.  As  for  example,  unlike  the  narrowband  case,  a  delay  in  time  cannot  be  as  simply  accounted  for  in  the 
phase  of  the  exponentials  in  the  time  domain  representation.  This  is  because  a  broadband  signal  is  composed  of  a 
continuum  of  frequencies  where  each  constituent  frequency  goes  through  a  phase  change  different  from  any  other. 
Within  any  nonzero  bandwidth  of  the  signal  there  may  be  an  infinity  of  frequencies  all  of  which  experience  changes 
in  their  respective  phases.  Hence  the  broadband  signal  cannot  be  modeled  simply  by  a  sum  of  a  finite  number  of 
exponentials  as  is  the  case  for  narrowband  signals.  For  these  same  reasons  the  data  matrix,  correlation  matrix 
and  the  Hankel  matrix  formed  with  the  time  domain  broadband  data  do  not  possess  the  nice  properties  which 
the  narrowband  case  enjoys  even  when  absolutely  no  background  noise  is  present.  In  fact,  all  these  matrices,  if 
formed  with  noiseless  time  domain  broadband  data,  will  be  of  full  rank,  no  matter  how  large  the  dimensions  of 
the  matrices  are. 

It  should  be  clear  from  the  discussions  in  the  last  puagraph  that  it  will  be  futile  to  attempt  to  solve  the 
angles  of  arrival  problem  for  the  broadband  case  by  directly  implementing  the  high  resolution  methods  in  [1- 
13]  on  the  time  domain  broadband  data.  Coker  and  Ferrara  [1982]  have  provided  a  fairly  thorough  discussion 
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on  the  problems  of  applying  the  high  resolution  methods  developed  for  narrowband  problems  directly  to  the 
broadband  problem.  Wax  et  al  [1984]  have  formulated  both  the  narrowband  and  the  broadband  problems  and 
have  emphasized  the  inconvenience  of  processing  broadband  data  in  the  time  domain. 

The  problems  depicted  above  change  substantially  in  the  spectral  domain  representation  of  wideband  array 
data  [Wax  et  al  1984].  It  will  be  shown  in  Subsection  II  that  the  spectral  density  matrix  at  a  frequency  bin  within 
the  nonzero  bandwidth  of  the  wideband  signal  possesses  many  interesting  properties  which  are  quite  similar  to 
the  narrowband  case.  Namely,  the  rank  of  the  spectral  density  matrix  is  exactly  equal  to  the  number  of  sources 
when  no  noise  is  present.  Also,  the  column  and  the  row  spaces  of  the  spectral  density  matrices  have  familiar 
Vamdermonde  type  structures  which  can  be  compared  to  the  structure  of  the  correlation  matrix  for  narrowband 
case.  Hence  the  structured  matrix  approximation  approach  [17]  could  be  applied  to  the  spectral  density  matrix 
at  a  frequency  bin  to  estimate  the  arrival  angles.  Techniques  based  on  processing  the  spectral  density  matrix  of 
a  single  bin  have  been  proposed  in  literature  [Owsley  and  Swope,  1981].  This  approach  does  not  utilize  all  the 
information  available  in  the  spectral  density  matrices  at  different  frequency  bins  within  the  nonzero  bandwidth 
of  the  signal  and  is  more  suited  in  the  case  of  multipath  propagation  of  a  narrowband  source.  Regarding  the 
spectral  density  matrices  at  different  bins,  although  the  true  ranks  of  all  the  spectral  density  matrices  are  same 
and  equal  to  the  number  of  distinct  sources,  the  column/row  spaces  vary  for  different  frequency  bins.  It  is  rather 
expensive  to  process  at  separate  bins  separately  and  then  combine  the  results  to  obtain  the  angle  estimates. 

In  this  part  of  our  research,  a  spectral  domain  based  method  is  presented  which  coherently  combines  all  the 
spe^.  ial  density  matrices  and  the  angle  estimates  are  obtained  from  the  coherently  combined  spectral  density 
matrix.  Compared  to  an  existing  coherent  method  [27,  28],  the  present  appro2u:h  is  non-iterative  in  the  sense  that 
all  the  incident  angles  are  estimated  at  a  single  step.  Also,  unlike  the  existing  coherent  method,  the  initial  estimates 
of  the  arrival  angles  are  not  required  while  coherently  combining  the  spectral  density  matrices.  The  method  is 
based  on  generalized  eigen-decomposition  and  utilizes  dense  array  approximation  and  a  bilinear  transformation 
for  coherently  combining  the  spectral  density  matrices  of  all  the  frequency  bins.  Some  preliminary  work  on  the 
coherent  method  described  here  was  outlined  in  [32].  Here  we  show  that  by  pre-  or  post-multiply  the  signal- 
subspace  matrix  with  the  combined  noise  matrix  we  can  essentially  avoid  the  computation  of  the  generalized 
eigendecomposition  or  matrix-pencil.  This  enables  us  to  use  regular  eigendecomposition  routine  to  estimate  the 
source  angles.  It  is  also  shown  that  it  may  be  numerically  more  stable  if  the  coherent  combination  is  not  focused 
in  the  center  frequency  the  numerical  value  of  which  could  be  very  leirge.  The  new  focusing  matrix  given  here 
allows  to  focus  independent  of  the  center  frequency. 

Some  other  techniques  for  localization  and  source  spectra  estimation  of  wideband  sources  have  been  reported 
by  several  researchers  in  the  last  two  decades.  See  the  tutorial  papers  in  the  Specifd  Issue  on  Time  Delay  Esti¬ 
mation  [Carter,  1981].  The  conventional  approach  for  the  case  of  a  single  source  is  to  use  a  form  of  generalized 
correlator  [Knapp  and  Carter,  1976]  to  estimate  the  Time-Difference-Of-Arrival  (TDOA)  of  the  signal  at  the 
sensors.  Maximum  likelihood  based  methods  [Bangs  and  Schultheiss,  1973;  Hahn  and  IVettev,  1973;  Wax  and 
Kailath,  1983]  for  single  and  multiple  sources  require  the  knowledge  of  source  and  noise  spc  .( ra  and  are  computa¬ 
tionally  expensive.  Parameter  estimation  based  methods  [Morf  et  al,  1979;  Porat  and  FVicdlander,  1983,  Nehorru 
et  al,  1983;  Su  and  Morf,  1983]  assume  Auto- Regressive  Moving  Average  (ARMA)  models  for  the  received  signals 
and  the  estimated  ARMA  parameters  are  utilized  for  TDOA  estimation.  Computational  complexity  of  these 
methods  is  high  and  the  performance  of  these  approaches  depend  on  the  correctness  of  the  assumed  model  for 
the  unknown  wideband  signals. 

Extending  existing  ideas  in  the  narrowband  problem,  Waxet  al  [1984]  proposed  an  eigen-decomposition  based 
approach  for  wideband  source  localization.  In  their  approach,  the  eigenvectors  of  the  estimated  spectral  density 
matrix  at  each  narrowband  bin  of  the  signal  bandwidth  were  incoherently  combined  to  estimate  the  TDOAs. 
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Recently,  Wang  and  Kaveh  [1984,  1985]  presented  a  coherent  signal  subspace  based  approach  which  avoids  the 
rather  expensive  eigen-decomposition  of  spectral  density  matrices  at  each  frequency  bin.  In  their  approach, 
initial  estimates  of  the  angles  of  arrival  are  used  to  transform  the  signal  eigenspaces  at  different  frequency  bins  to 
generate  a  single  coherent  signsil  subspace  and  then  a  generalized  eigen-decomposition  is  used  for  obtaining  more 
accurate  estimates.  The  algorithm  iteratively  estimates  well  separated  angles  by  focusing  at  different  angles  at 
each  time. 

In  this  work  a  simple  bilinear  transformation  matrix  and  the  approximation  resulting  from  dense  and  equally 
spaced  array  structure  assumption  are  utilized  to  combine  the  individual  narrowband  spectral  density  matrices 
for  coherent  processing.  In  a  related  problem,  Henderson  [1985]  used  a  bilinear  transformation  and  dense  array 
approximation  for  rank  reduction  of  Hankel/Block-Hankel  type  data  matrices.  Henderson  considered  the  angle 
estimation  problem  of  multiple  sources  when  each  source  is  emitting  multiple  narrowband  spectra.  His  formulation 
and  approach  is  completely  in  the  time  domain  and  is  based  on  Singular  Value  Decomposition  of  Prony  type 
Data/Hankel  matrices.  Application  of  his  time  domain  method  to  the  broadband  data  may  encounter  the  problems 
outlined  in  the  introduction  of  this  Section  and  also  in  [Coker  and  Ferrara,  1982].  Also,  no  proof  was  given  to 
ensure  invertibility  of  the  bilinear  transformation  used  and  the  effect  of  the  transformation  to  the  noise  subspace 
is  unclear  since  no  assumption  was  made  on  the  noise  characteristic.  It  is  also  not  obvious  if  the  time  domain 
approach  is  applicable  for  correlated  sources. 

The  coherent  method  described  in  this  Section  is  based  on  spectral  domain  representation.  The  method  is 
non-iterative  and  does  not  require  preprocessing  for  obtaining  initial  estimates  of  the  angles  of  arrival  and  all 
the  angles  are  estimated  from  a  single  step  of  coherent  subspace  computation.  The  performance  of  the  proposed 
method  is  characterized  by  several  simulation  experiments. 

This  Section  is  arranged  as  follows.  In  Subsection  II,  the  coherent  problem  is  formulated  and  a  new  coherent 
algorithm  is  given.  In  Subsection  III  some  observations  on  out  use  of  Structured  matrix  approximation  approach 
is  given.  Finally  in  Subsection  IV,  tehe  results  of  our  simulations  are  shown. 


II.  Problem  Formulation :  The  observed  signal  is  assumed  to  be  composed  of  p  plane  waves  with  an  overlapping 
bandwidth  of  B  Hz.  They  are  sampled  simultaneously  at  the  output  of  a  lineu  array  of  M  (>  p)  equally  spaced 
sensors.  The  signal  received  at  the  ith  sensor  is  expressed  as 
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where  st(  )  is  the  signal  radiated  by  the  ibth  source,  D  is  the  separation  between  the  sensors,  c  is  the  propagation 
velocity  of  the  signal  wavefront,  Vk  is  the  angle  that  the  ibth  wavefront  makes  with  the  line  of  array  and  n,(-)  is 
the  additive  noise  at  the  ith  sensor. 

Representing  both  sides  by  respective  Fourier  coefficients, 

d 

l:=l 

with  wi  =  ^1,1  =  li, . . .  ,li  +  rif ,  where  ui,  and  ut,+nf  the  lowest  and  highest  frequencies  in  B.  In  matrix 
notation. 


R(u;,)  =  A(u;,)S(«0  +  N(w,) 
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where 


R(w/)  =  [/2i(wi) . . 

, .  i2ji#(wj)]‘ 

(4a) 

N(u;,)  =  [Wi(u»,).. 

(46) 

S(ui)  =  [5i(wi) . 

..5p(w,)]‘ 

(4c) 

and  A(u;/)  is  an  M  x  p  direction-frequency  matrix 

/  1 

1  \ 

A(wi)  = 

(4d) 

with  Tj  =  ^sin  u,-  being  the  TDOA  of  the  ith  source.  The  covariance  matrix  of  the  Fourier  coeflScient  vector 
R(u;/)  will  approach  the  spectral  density  matrix  if  the  observation  time  is  large  enough  compared  to  the  correlation 
time  of  the  processes  [30].  With  this  assumption, 

K(w,)  =  A(u;,)P.(a;,)A"(w,)  +  P„(w,)  (5) 

where,  K(a;/),  P,(u;j)  and  Pn((>'i)  &re  the  spectral  density  matrices  of  the  processes  r{(  ),  st(  )  and  n,-(  ),  respec¬ 
tively.  A^(wi)  stands  for  the  transpose  conjugate  of  A(wi).  The  noise  process  is  assumed  to  be  independent  of 
the  sources  and  the  noise  spectral  density  matrix  is  assumed  to  be  known  except  for  a  multiplicative  constant 
(7^.  With  the  above  model  at  hand,  the  problem  is  to  estimate  the  ri’s  from  the  estimated  covariance  matrices 
K{ui)  of  the  received  signal  plus  noise.  Estimates  of  the  angles  of  arrivals  v,-’s  can  then  be  computed  using  the 
relationship  in  (4e). 


Ila.  Previous  Methods  ;  The  two  major  approaches  which  exploit  the  properties  of  the  eigenspaces  of  K{u)i) 
to  estimate  the  arrival  angles  (or  TDOAs)  are  briefly  described  below. 

In  [26],  eigendecompositions  of  K(b;()s  are  performed  in  all  the  frequency  bins  in  B  and  globally  orthogonal 
direction  vectors  zure  obtained  by  computing  and  plotting  any  of  the  following  two  measures, 
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where  av(u;/)’s  are  direction-frequency  vectors  defined  as, 

a„(w()  A  [1  g-i«i^(Af-i)8in»j«  v*(wi)’s  are  the  eigenvectors  corresponding  to  the  smaller 

M  —  p  eigenvalues  of  K(w/).  Note  that  the  distance  measures  in  (6)  and  (7)  require  eigendecomposition  of  K(u>i) 
for  all  /  =  li,  /i  -I-  1,  ■  ■ .,  li  +  nj.  This  obviously  is  a  computationally  expensive  procedure.  Instead  in  [27] 
and  [28],  a  transformation  matrix  was  employed  to  reduce  this  burden.  Using  initial  estimates  of  the  possible 
angles  of  arrivads,  transformation  matrices  T«(u;/)  were  formed  such  that  direction-frequency  matrices  of  all  the 
frequency  bins  are  transformed  to  the  center  frequency  We  in  B,  i.e. 


Te(u;;)A(w/)  =  A(k;e),  I  =  h,li  +  I, . .  .,li  +  nj 


(8) 
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Then  using  the  transforming  matrices  T«(ci;/),  I  =  li,li  +  1, . .  .,li  +  nj ,  all  the  spectral  density  estimates  are 
combined  in  the  following  manner, 

*1+"/ 

G  A  ^  T«(a;,)K(a;,)Tf  (w,) 

=  A(uc)G»A^(wc)  +  alGn  (9) 

where 

i+ny  li+n/ 

G.=  Y1  =  53  Te(u,,)P„(u;,)Tf  (w,)  (10) 

i=t  1=1, 

Next,  a  coherent  signal  subspace  theorem  [28]  for  the  matrix  pencil  (G,  G„)  is  utilized  to  estimate  the  angles  in 
the  direction /s  of  v  by  computing  the  maxima  of  the  following  measure. 


J(u)  = 


E"p+ilK"(‘^c)e*(u;c)||= 


where  ep+i{uc),ep+2(!^c),  ■  ■  •  .sm  (^e)  the  generalized  eigenvectors  of  the  matrix  pencil  (G,  Gn)  corresponding 
to  the  smallest  M  —  p  eigenvalues.  If  the  arrival  angles  are  well  separated,  different  transformation  matrices 
needs  to  be  utilized  to  ‘focus’  to  particular  directions  using  initial  estimates  of  the  angles  [27].  The  coherent 
transformation  described  here  does  not  require  any  initial  estimates  of  the  arrival  angles. 

Ilb.  Proposed  Method  ;  The  coherent  signal  subspace  processing  technique  described  above  requires  the 
knowledge  of  the  initial  estimates  of  the  angles  of  arrivals  to  form  the  transformation  matrix  Tv(u;().  To  avoid 
this  requirement,  a  different  approach  that  utilizes  a  bilinear  transformation  and  dense  array  approximation  to 
form  the  transformation  matrices  is  presented  here. 


Synthesizing  a  Bilinear  Transformation  Matrix  :  Let  B  be  an  A/  x  M  matrix  constructed  from 
the  coefficients  of  the  M~1  th  order  z-polynomials  Pi(z)  =  (1  +  —  r)*“*,)k  =  1,  2,  ..., 

M.  The  elements  of  the  ibth  row  of  the  matrix  B  are  the  coefficients  of  Pk(z)  taken  in  ascending  order  of  z. 
The  coefficients  of  pt(z)  can  be  found  by  convolving  the  coefficients  of  the  polynomi2js  (1  +  and  (1  — 

For  example,  a  4  x  4  B  matrix  will  have  the  form. 


Bm=4  = 


13  3  1 

11-1-1 
1-1-1  1 
1-3  3-1 


Proposition  :  The  M  x  M  matrix  B,  defined  above,  is  nonsingular  and  hence,  its  rows  are  linearly  independent. 
Proof  :  Let  Z  be  a  real  M  x  M  V2uidermonde  matrix  of  the  form 
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where  z%^  Zj  —I  So  Z  is  nonsingular  and  its  non-zero  determinant  is  given  by  det  Z  =  ~ 

Zj).  Since  the  fcth  row  of  B  are  the  coefficients  of  M-1  th  order  z-polynomials  pt(z)  =  (1  z)*^“*(l  — 

z)‘‘~^,k  =  1,  2,  . . . ,  M  and  Z  has  Vandermonde  structure,  it  can  be  easily  shown  that, 


so  that, 


BZ  = 


(1+^*/)"-* 

*(1—1) 

|V-1 

V'*'**/  '  V'*""/ 


=  JJ  2(z,  -  Zj) 

=  detB  detZ  (15) 

so  det  B  =  risf  >i»>i  2  =  2  '  a"  ^  ^  0.  Hence,  B  is  nonsingular  and  its  rows  are  linearly  independent.  0 

A  similar  bilinear  tr2uisformation  matrix  was  given  in  [29]  though  from  the  manner  in  which  it  was  constructed, 
it  was  rather  difficult  to  appreciate  its  properties. 

Now,  since  A(wj)  in  (4a)  has  a  Vandermonde  structure,  following  the  S2une  steps  as  in  (15)  it  is  easy  to  see 


BA(u;()  = 
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where,  E(w;)  denotes  the  diagonal  matrix  in  (16).  If  the  sensor  to  sensor  separation  D  is  small  compared  to  all 
wavelengths  in  B,  then  tan(!*!^)  ~  j.  Using  this  approximation  and  premultiplying  BA((>;i)  by  an  M  x  M 
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diagonal  matrix  D(j^)  whose  (m,  m)t/»  term  is  given  by  obtain, 


D(— )BA(w/)  ~ 

ui 


A  A'E(wi)  (18) 

The  matrix  A'  whose  columns  are  the  transformed  direction-frequency  vectors  has  no  dependence  on  the  fre¬ 
quencies  at  all!  It  may  be  pointed  out  here  that  in  [32],  we  had  proposed  using  a  center-frequency  dependent 
coherent  transformation  matrix,  D(^)  whose  (m,m)th  term  was  given  by  {^)"’  where  We  =  2jr/c,  and  fc 
is  the  midband  frequency  in  B.  Also  note  that  A'  has  a  Vandermonde  structure  and  its  columns  are  linearly 
independent  as  long  as  Ti  jt  tj  for  t  ^  j. 

A  Center- Frequency-Independent  Transformation  Matrix  and  Coherent  Processing  :  The  new  transformation 
matrix  which  we  define  as  T'(w/)  A  D(^)B,  does  not  depend  on  the  arrival  angles  or  the  center  frequency  and 
since  B  is  nonsingular  and  D(;37)  diagonal  with  nonzero  diagon^d  elements,  T'(a//)  is  also  nonsingular.  Using 
these  transformation  matrices  T'(w/),  /  =  /i,/i  -1-  1 . . .  ,/i  -I- «/,  all  the  spectral  density  estimates  can  now  be 
combined  in  the  following  manner, 

li+nj 

G'A  5^  T'(a;,)K(a;,)T'"(u/,) 


where 


=  a'g;a'"  + 


Ij+n/ 

G;  =  E(u>,)P.(wi)E"(u;,) 

/=J, 


G;  =  53  T'"(w,)Pn(u;,)T'"(u;,) 


Next,  the  coherent  signal  subspace  theorem  for  the  matrix  pencil  (G',  G^)  is  utilized  to  estimate  all  the  angles 
of  arrivals  by  computing  the  maxima  of  the  following  measure, 

where  ^p+i,^p+2>  ■  ■  ■  generalized  eigenvectors  of  the  matrix  pencil  (G',G'„)  correspond¬ 

ing  to  the  smallest  M  —  p  eigenvalues  and  a(,’8  are  the  new  direction-frequency  vectors  defined  as, 
aj,  A  [1  ysinv  ...  (^sinv)*^"*]V 

From  (19)  it  is  adso  obvious  that  since  GJ,  is  known  to  be  positive  definite  [28],  pre-multiplying  G'  by  G'„, 
we  get, 

Gpr£AG;-‘a'G',A'"  all.  (23) 

Similarly,  post-multiplying  G'  by  G'„  and  taking  transpose  conjugated,  we  get, 

G/.ostAG;-‘a'G'/a'"  -H  all.  (24) 
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It  can  be  shown  that  both  Gpre  and  GposT  possess  exactly  same  set  of  eigenvectors  and  hence  regular  eigen- 
decomposition  routine  on  either  of  these  matrices  would  yield  similar  maxima  as  in  (22). 

III.  Discussion  on  Structured  Approach  :  If  we  look  at  the  matrix  structures  of  G',  or  G'pQgj.^ 

in  (19),  (23)  and  (24),  respectively,  the  signal  part  of  each  one  of  these  matrices  have  a  Vandermonde  Matrb:  A' 
as  a  matrix  factor  appearing  in  the  front.  Such  Vandermonde  structures  had  been  exploited  successfully  In  [l^] 
for  estimation  of  frequencies  and  wavenumbers  using  structured  matrix  approximation  approach.  But  there  is  a 
key  difference.  The  Vandermonde  matrix  appearing  in  [17]  had  complex  exponentials  as  their  elements.  Hence 
complex-conjugate-symmetry  had  to  be  imposed  on  the  estimated  polynomial  coefficients  so  that  the  roots  lie  on 
the  unit  circle.  But  in  the  present  case  the  elements  of  A'  are  all  real.  The  constraints  to  enforce  the  roots  to 
lie  on  the  real  axis  are  all  non-linear  in  nature  [33,  34].  Incorporation  of  these  constraints  in  the  optimization 
criteria  appearing  in  [17]  would  lead  to  the  use  of  general  non-linear  optimization  criteria  such  as  Fletcher-Powell 
or  Gauss-Newton-Marquardt  methods.  Instead  we  had  attempted  to  use  the  structured  approach  directly  without 
the  conjugate  symmetry  constraints  on  the  polynomial  coefficients.  The  initial  results  were  encouraging  and  at 
high  SNR  the  approach  gave  excellent  results.  But  as  the  SNR  was  lowered,  the  roots,  without  any  constreiints, 
started  to  become  complex  in  order  to  minimize  the  unconstrained  criteria.  These  results  were  not  as  good  as 
the  results  we  obtained  using  eigendecomposition  method  as  shown  below. 

IV.  Simulation  Results  :  The  same  simulation  examples  as  presented  in  [27]  and  [28]  were  used  to  evaluate  the 
performance  of  the  proposed  method.  In  all  the  simulations,  a  linear  array  of  M  =  16  equally  spaced  sensors  was 
used.  The  spacing  between  two  consecutive  sensors  is  D  =  where  /^  =  ^  =  the  midband  frequency  in 
B  and  r  is  the  ratio  of  the  wavelength  at  the  midband  frequency  fc  and  the  inter-element  spacing  D.  The  source 
signals  are  temporarily  stationary  bandpass  white  Gaussian  processes  with  zerr  mean.  The  noise  processes  at 
each  sensor  are  stationary,  statistically  independent,  identical  white  Gaussian  bandpass  processes  with  zero  mean 
and  are  independent  of  the  source  processes.  The  sources  and  the  noise  processes  have  the  same  bandwidth  of 
B  =  100.  The  same  sampling  specifications  and  data  segmentation  as  described  in  [27]  and  [28]  were  used.  The 
received  signal  plus  noise  processes  were  sampled  at  each  sensor  at  80Hz  and  then  divided  into  64  segments  of 
64  samples  each  and  then  each  segment  was  transformed  into  frequency  domain  by  unwindowed  FFT  to  obtain 
ny  1  =  33  narrowband  components.  The  covariance  matrix  estimate  at  the  jth  frequency  was  estimated  as 

1  ^ 

=  ;^EXn(w,)X"(a;,),  /  =  /i,/j-Pl,...,/i-l-n;  (25) 

n=l 

where,  the  vector  Xn(w;)  is  the  Ith  component  of  the  Fourier  transformed  data  for  the  nth  segment.  The  signal 
to  noise  ratio  is  defined  as  the  ratio  of  the  signal  po  .ver  of  one  source  and  the  power  of  the  the  noise  at  the  output 
of  a  single  sensor. 

Figure  1  shows  the  results  for  the  case  of  two  uncorrelated  sources  at  vj  =  9.0®  and  vj  =  12.0*  for  SNR  = 
10  dB  and  r  =  4.  The  Figure  shows  overlapped  plots  of  5  independent  runs.  The  source  angles  are  clearly  well 
resolved  at  this  low  SNR.  In  Figure  2  the  results  of  the  case  of  one  signal  being  well  separated  from  two  closely 
spaced  sources  is  shown.  Three  independent  sources  with  sinvi  =  0.15,  sinv2  =  0.2  and  sinva  =  0.4  were  used 
with  SNR  =  lOdB  and  r  =  3.  The  results  of  the  five  runs  show  that  all  the  three  angles  are  well  estimated  by  one 
step  of  the  modified  coherent  signal  subspace  processing  method  described  above,  whereas  in  [27]  at  least  three 
iterations  were  required  to  ensure  the  angular  positions  of  the  three  sources.  Also  the  focusing  transformation 
matrix  inherently  required  previous  estimate  of  the  arrival  angles.  The  results  for  the  case  of  two  completely 
correlated  sources  [-3]  are  shown  in  Fig.  3  for  SNR  =  lOdB  and  r  =  4.  The  results  show  that  the  angles  were 
resolved  in  all  cases  though  there  seems  to  be  some  vau'iability  in  the  estimate  of  the  source  at  vj  =  12.0®. 
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I  :  The  ipetiel  ipectrum  of  five  run*  when  two  uncorrelated  wideband 
goufces  are  at  yi  *  fl*  and  vj  *  12*.  r  ®  4  and  SNR  *  lOdB. 


Fig.  2  :  The  spatial  spectrum  of  five  runs  when  three  urxorrelated  wideband 
sources  are  at  sinvi  b  0.15,  sinva  b  0.2  and  sinva  b  0.4.  r  b  3 
and  SNR  b  lOdB.  Note  that  all  the  angles  are  estimated  at  a  single 
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Fif .  3  :  The  ipatiel  ipectrum  of  five  nini  when  two  correlated  wideband 

are  at  vi  «  9*  and  «  12*.  r  *  4  and  SNR  «  lOdB. 
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SECTION  3.4  :  ORDER  RECURSIVE  PARAMETRIC  BiSPECTRUM  ESTIMATION 
SUMMARY 

Order  recursive  computation  of  AR  parameters  from  cumulants  is  addressed.  If  the  Cumulant  matrix  is 
neither  Toeplitz  nor  symmetric,  it  is  shown  that  using  a  block  matrix  inversion  formula  due  to  Frobenius  and  Schur 
[1],  the  inverse  of  the  p-dimensional  cumulant  matrix  can  be  updated  from  the  (p  —  l)-dimensional  inverse  with 
O(p^)  operations.  When  compeired  to  batch  mode  computation,  the  proposed  algorithm  reduces  the  computational 
requirement  for  order-recursive  calculation  of  the  AR-parameters.  When  the  cumulant  matrix  is  non-symmetric 
Toeplitz  also,  further  reduction  in  computation  is  obtained  using  an  algorithm  due  to  IVench  [7]. 

I.  Introduction 

With  the  advent  of  faster  and  cheaper  digital  computers,  the  signal  processing  community  is  paying  a  timely 
and  much  well  deserved  attention  to  spectrum  estimation  methods  based  on  higher-order  statistics  [2-5].  Many 
problems  that  are  known  to  be  intractable  in  the  second-order  domain  of  auto-correlation  and  Power  Spectrum, 
can  be  rather  easily  handled  in  the  higher-order  domains  [see  examples  in  2-4],  albeit  with  increased  computational 
load.  The  last  decade  has  seen  a  renewed  interest  in  developing  algorithms  based  on  higher-order  statistics  rmd 
with  the  cost  of  computing  coming  down,  these  methods  will  be  practicable  in  the  near  future. 

Following  the  footsteps  of  success  in  Power  Spectrum  estimation,  linear  time-invarirmt  model-based  paramet¬ 
ric  approaches  have  been  found  to  be  very  effective  in  the  higher-order  domains  also  [2-8].  Specifically,  estimation 
of  the  AR  parameters  from  the  third  moments  plays  a  key  role  in  AR  and  ARMA  approaches.  Almost  all  the 
presently  available  techniques  assume  the  knowledge  of  the  model  order  (p)  and  the  AR  parameters  are  computed 
in  batch  mode  by  computing  the  inverse  of  a  p  x  p  third  moment  or  cumulant  matrix  (which  may  or  may  not 
be  Toeplitz).  In  practice,  of  course,  the  exact  model  order  will  not  be  known  and  one  has  to  determine  it  from 
the  available  data.  The  order  determination  remains  an  open  problem  but  any  order  selection  approach  [6,  for 
example]  would  most  likely  require  the  estimation  of  the  AR  parruneters  for  each  order  starting  from  order  1. 
Hence,  at  each  order  k  =  1,2, . .  .,p,  one  would  have  to  invert  &kxk  cumulant  matrix.  But  the  cumulant  matrix 
usually  contains  a  good  deal  of  structure  in  the  sense  that  the  matrix  at  lower  model  order  remains  embedded 
in  the  matrix  of  higher  model  order.  The  main  purpose  of  this  work  is  to  exploit  the  structure  in  the  cumulant 
matrices  for  order-recursive  calculation  of  the  inverse  of  the  cumulant  matrix  for  the  ib‘*  order  based  on  the  inverse 
calculated  at  {k  —  1)‘*  order.  The  AR  parameters  aue  also  computed  order  recursively  in  the  process. 

Obviously,  the  motivation  of  this  work  comes  from  the  well  known  Levinson  recursion  algorithm  for  computing 
the  AR  parameters  using  second  order  statistics  or  autocorrelation  estimates.  It  should  be  emphasized  here  though 
that  unlike  the  autocorrelation  matrix  arising  in  the  normal  equations,  the  cumulant  matrix  is  not  necessarily 
Hermitian  Toeplitz.  Furthermore,  the  nice  orthogonality  properties  on  which  the  Levinson  recursion  is  based  on 
do  not  exist  here.  Instead,  a  block  matrix  inversion  lemma [1],  which  is  applicable  to  any  general  invertible  matrix, 
is  utilized  here.  This  enables  recursive  computation  of  the  AR  parameters  at  increasing  model  orders  without 
having  to  invert  the  whole  cumulant  matrix  from  scratch  for  each  model  order.  In  some  cases,  the  cumulant 
matrix  may  also  have  Toeplitz  structure  and  then  the  order-recursive  computation  of  the  inverse  of  the  cumulant 
matrix  and  the  corresponding  AR-parameter  vector  can  be  performed  using  an  algorithm  due  to  Trench  [9,10]. 
if  the  underlying  process  is  of  high  but  unknown  model  order,  the  proposed  schemes  will  result  in  considerable 
computational  savings. 

This  Section  is  arranged  as  follows:  In  Subsection  H,  the  problem  formulation  is  given.  In  Subsection  III, 
two  order  recursion  algorithms  are  given.  The  first  one  is  applicable  for  cumulant  matrix  with  general  structure 
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whereas  the  second  one  is  meant  for  Toeplitz  cumulant  matrix.  Finally,  some  discussions  regarding  the  algorithms 
are  given  in  Subsection  IV. 

II.  PROBLEM  Formulation 


The  third  moment  (or  cumulant)  sequence  of  a  real  discrete  process  x(n)  is  defined  as, 

c(/,m)  A  E{x(n)x{n  +  l)x{n  +  m)}.  (1) 

E{  }  denotes  the  expectation  operation.  We  will  assume  the  process  x(n)  to  be  zero  mean  and  third-order 
stationary.  Now  consider  the  case  when  i(n)  is  a  kth  order  auto-regressive  process,  i.c.,  the  present  value  of  i(n) 
is  formed  by  the  linear  combination  of  the  past  k  samples  plus  driving  noise  sample  as  described  by  the  following 
regression  formula: 

k 

i=l 

The  noise  samples  w{n)  are  assumed  to  be  zero-mean,  white  and  non-Gaussinaly  distributed  with 

denoting  the  variance  and  £{tn^(n)}  =  0  denoting  the  third  moment.  In  (2),  causal  AR  model  is  assumed,  though 

the  non-causal  case  can  also  be  handled  by  the  order-recursion  algorithm  considered  here. 

Multiplying  both  sides  of  (2)  by  x(n  —  1)x(ti  —  m)  and  taking  expectation  we  get, 

k 

c{~l,-m)  =  q(«)c(i  —  l,i-  m)  +  06(1,  m),  for/,m>0.  (3) 

tsl 

These  equations  are  known  as  the  ihird~order  recursion  (TOR)  equations  [2].  There  are  many  possibilities  of 
writing  the  above  set  of  equations  in  the  matrix  equation  form  in  order  to  solve  for  the  AR  parameters.  In  [2], 
[3]  and  [5]  two  possibilities  which  include  the  /  =  m  =  0  point  (t.e.,  the  origin  or  of  /  —  m  plane)  were  proposed. 
In  the  first  case,  they  chose  the  I  =  m  line  to  obtain  the  following  matrix  equation  for  AR  model  order  k  : 


(  c(0,0) 

c(-l,-l) 

\c{-k,-k) 


c{k,k) 

c{k-l,k~l) 

c(0,0) 


A  b(*). 


(4a) 

(46) 


Note  that  the  (k  -|- 1)  x  (k  ■+  1)  cumulant  matrix  Cy is  Toeplitz  but,  in  general,  not  symmetric.  The  Toeplitz 
structure  is  denoted  by  the  subscript  r  in  (4).  In  another  representation  considered  in  [2],  [3]  and  [5],  /  and  m 
values  are  chosen  in  the  wedge  region  contiguous  to  the  origin  between  the  /  =  m  line  and  the  1  =  0  line.  Their 
choice  of  /  and  m  were  as  follows  : 


/  =  0,...,L,  and  m  = 


/,  for  /  <  I 
M,  forf=L, 


(5a) 


where  L  and  M  are  chosen  such  that  M  <  L,  and 


k  =  1  +  Af  -k 


(L-l)(L-k2) 

2 


(56) 
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Note  that  the  cumulant  matrix  formed  in  the  second  case  is  not  Toeplitz.  For  example,  if  order  k 
have  L  =  2  and  M  =  0  and 


=  3,  one  would 


c(0,0) 

c(-l,0) 

c(-l,-l) 

c(-2,-l) 


c(2,2)  c(3.3) 
c(l,2)  c(2,3) 
c(l.l)  c(2,2) 
c(0,l)  c(l,2) 


A  b(3>. 


The  subscript  nt  denotes  the  non-Toeplitz  structure  of  this  choice  of  cumulant  matrix.  Certainly,  the  above  two 
choices  are  not  unique  for  determination  of  the  AR-parameters  because  according  to  equation  (3)  one  could  select 
any  set  for  the  bottom  k  equations  as  long  as  l,m  >  0  (the  top  equation  for  /  =  m  =  0  is  needed  to  determine 
0).  In  fact,  one  can  form  another  set  of  equations  with  Toeplitz  cumulant  matrix  utilizing  the  third  moments 
on  any  straight  line,  I  =  m  +  d,  where  d  >  0  is  a  constant.  Also  for  order  k  =  3,  one  could  use  the  equation 
with  1  =  2  and  m  =  0  to  replace  the  fourth  row  equation  in  (6).  With  true  cumulant  values,  any  of  these  sets  of 
equations  will  give  the  correct  AR  parameters  as  solutions. 

Now,  let  us  look  at  the  cumulant  matrix  for  the  p  =  4  case  corresponding  to  (5),  then. 
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c(-l.O) 
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Note  that  the  cumulant  matrix  of  (6)  reappears  in  (7)  as  the  upper  left  corner  partitioned  block.  The 
situation  will  be  the  same  if  we  used  the  Toeplitz  structure  of  (4)  or  any  other  set  of  equations  satisfying  (3). 
This  property  that  the  cumulant  matrix  at  higher  model  order  contains  the  matrix  of  lower  order  will  be  utilized 
later  in  the  recursion  algorithm. 

Whether  the  general  non-Toeplitz  structure  of  (6)  or  the  Toeplitz  structure  of  (4)  is  used,  we  can,  in  general, 
write  the  set  of  ;fc  1  equations  as, 


c(0,0)  c(l,l) 


c{k,k) 


The  elements  of  and  will  be  denoted  generically  as. 
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The  AR  parameters 


(10) 


eure  found  by  solving  the  lower  k  equations  of  (8),  t.e., 


(11a) 

(116) 

(12a) 

(126) 

(13a) 

(136) 

(13c) 

(13d) 


This  relationship  of  the  inverses  at  successive  model  orders  are  utilized  in  the  recursive  algorithms  given  later. 
Now  we  are  ready  for  the  recursive  update  algorithm  for  calculating  the  inverse  of  the  cumulant  matrix  as  well 
as  the  AR  parameters  at  (/b  +  l)th  order  based  on  the  same  at  ifcth  order.  First  we  will  give  the  most  general 
algorithm  and  later  the  Toeplitz  case  will  be  considered. 
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IIL  The  Recursion  algorithms 

ALGORITHM  1  :  (  General  Case)  We  will  use  the  notation  of  equation  (9)  in  describing  the  algorithm. 

=  [ci] 

c(o-‘  =  J_ 

Cii 


^(1) 
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- 
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k 

=  c(0,0)  +  ]^a|*^c(t,«). 

i=l 

ALGORITHM  2  :  (  Toeplitz  Case)  In  this  case  we  use  the  notation  of  eq.  (4)  : 

=  [c(-l,-l)] 
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k 

=  c(0,0)  +  ^a|*^c(»,i). 

«=i 

Though  it  is  not  explicit  in  the  algorithm  development,  it  must  be  emphasized  here  that  a  Toeplitz  matrix  being 
persymmetric  (i.e.,  symmetric  about  the  cross  diagonal),  its  inverse  is  persymmetric  also  [2].  Hence  computation 
of  the  elements  in  either  the  upper  left  or  the  lower  right  triangle  of  the  inverse  matrix  will  suffice.  This  will 
provide  further  computational  savings  for  the  Toeplitz  case.  For  more  details  on  this  algorithm  see  [9-12]. 

IV.  Discussion 

Both  the  recursion  algorithms  outlined  here  use  the  inverse  of  the  cumulrmt  matrix  at  the  previous  order  to 
update  the  matrix  inverse  of  successive  orders  and  produce  the  AR  parameter  estimates,  for  ifc  =  1, 2, . . .  ,p. 
If  the  true  values  of  the  third  order  moments  are  not  available  then  these  must  be  estimated  from  (possibly 
noisy)  data.  We  should  point  out  that  if  the  true  model  order  is  known  exactly,  there  is  no  gain  in  using  the 
proposed  recursion  algorithms  and  the  direct  batch  mode  inverse  will  be  sufficient.  But  if  the  true  order  is  not 
known  then  one  may  have  to  check  at  different  and  increasing  model  orders.  In  such  cases,  the  proposed  update 
algorithm  will  reduce  the  cost  of  computation  at  higher  model  orders.  As  noted  in  [1,  pp  188-191]  for  the  general 
case,  if  the  ib-dimensional  inverse  is  to  be  computed  based  on  the  known  (k  —  l)-dimensional  inverse,  one  would 
save  {k  —  1)^  computations.  When  compared  to  batch  mode,  which  normally  requires  k^  operations,  the  update 
would  require  k^  —  (k  —  1)^  =  ik^  —  3/fc  -f-  1  operations.  The  computational  savings  will  be  even  more  for 
the  Toeplitz  case  [9,10].  It  may  also  be  pointed  out  that  TOR  type  equations  also  appear  in  anti-causal  AR 
models  and  in  ARMA  parameter  estimation  [3,  eq  (5.31c)]  with  third-order  statistics  and  the  proposed  recursion 
algorithms  will  be  applicable  in  those  cases  also.  Furthermore,  the  recursion  algorithms  are  ailso  applicable  for 
order-recursive  estimation  of  the  AR  part  of  ARMA  models  from  autocorrelation  estimates  [12]. 
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SECTION  3.5  :  VOWEL  AND  PHONEME  RECOGNITION  WITH  A  TIME-DELAY  NEDBA.L  NETWORK 
Summary 

A  Time-Delay  Neural  Network  architecture  is  used  for  speaker  independent  recognition  of  the  long  vowel 
sounds  a,  e,  i,  o  and  u.  The  present  work  extends  the  work  in  [1-4]  by  training  the  network  with  the  LPC  coefficients 
of  speech  along  with  FFT  bin  energies  and  by  allowing  longer  and  variable  length  utterances.  Furthermore, 
the  training  has  been  performed  with  multiple  speakers  using  English  rather  than  Japanese  speech.  With  this 
modifications,  we  have  obtained  100%  recognition  of  all  vowels  spoken  by  two  speakers. 


I.  Introduction 

Many  attempts  have  been  made  to  use  neural  networks  for  speech  recognition.  Most  of  these  attempts  have 
only  been  successful  in  limited  situations.  Often  one  of  the  limiting  factors  has  been  the  requirement  of  precise 
temporal  alignment.  This  is  a  necessary  requirement  for  most  recognition  algorithms  since  recognition  is  achieved 
by  matching  known  features  with  observed  features.  Neural  networks  are  not  different  in  this  respect.  They  have 
been  found  to  be  very  sensitive  to  spatial  shifts.  As  a  solution  to  this  problem  many  speech  recognizers  have  made 
use  of  segmentation  algorithms  to  pre-segment  utterances  in  order  to  find  the  beginning  and  ending  of  a  sound, 
word  or  phrase  precisely  [7].  Once  this  is  done  the  signal  features  go  through  nonlinear  time  warping  algorithm 
in  order  to  normalize  the  duration  of  the  utterance  of  interest,  thus  achieving  temporal  alignment.  Segmentation, 
however,  is  erroneous  in  itself  and,  when  in  error,  causes  recognition  failure  due  to  mismatch  between  the  training 
utterance  and  the  testing  utterance.  From  this  it  is  apparent  that  a  recognition  system  that  is  not  dependent  on 
temporal  alignment  needs  to  be  developed  before  speech  recognition  can  be  a  reliable  process. 

One  approach  to  eliminate  the  problem  of  temporal  alignment  as  discussed  above  would  be  to  classify  the 
features  based  on  a  single  time  frame  of  an  utter2uace.  Although  this  would  solve  the  alignment  problem  the 
classification  performance  will  certainly  be  degraded  since  much  of  the  information  about  the  signal  is  found  from 
the  manner  in  which  the  features  vary  from  one  time  frame  to  the  next.  For  this  reason  it  seems  that  this  idea 
needs  to  be  modified  such  that  the  recognition  algorithm  looks  at  multiple  time  frames  at  one  time  in  order  to 
make  use  of  the  inter-frame  variations  of  the  input  features.  This  can  be  achieved  by  having  a  large  window  that 
encompasses  a  number  of  time  frames.  This  large  window  is  then  passed  over  the  incoming  data  looking  for  local 
movements  in  the  features,  thereby  eliminating  the  need  for  any  temporal  alignment  as  long  as  the  time  frames 
are  sufficiently  small.  The  requirement  for  a  small  time  frame  is  to  ensure  that  the  signal  can  be  considered  to 
be  stationary  during  that  time.  For  speech  it  has  been  found  [7]  that  time  frames  of  10ms  are  adequate  to  meet 
this  requirement. 

The  alignment  problem  and  its  solution  as  discussed  above  are  quite  well  suited  for  a  peirticular  type  of 
neural  network  architecture  known  as  the  Time-Delay  Neural  Network  (TDNN).  A  number  of  recent  studies  [1- 
4,8,9]  have  demonstrated  the  ability  of  TDNN  to  handle  the  dynamic  nature  of  speech  for  phoneme  recognition. 
As  with  most  neural  network  architectures,  complex  constraint  satisfaction  is  obtained  via  a  massively  parallel 
configuration  but  in  the  case  of  TDNN  the  constraints  can  occur  over  a  period  of  time  as  desired  for  speech 
recognition.  This  gives  the  network  the  ability  to  represent  relationships  between  events  in  time  and  at  the  same 
time  TDNN  allows  for  invariance  of  these  events  under  translation  in  time.  With  this  translation  invariance, 
the  network  does  not  require  precise  temporal  alignment,  therefore  the  network  is  able  to  simply  scan  the  input 
features  for  clues.  This  is  a  necessary  requirement  for  efficient  continuous  speech  recognition. 

The  work  presented  here  was  motivated  by  the  work  reported  by  Waibel  and  his  colleagues  [l-4,8,9]  on  TDNN 

^This  work  was  partly  supported  by  AFOSR  Grant  AFOSR-89-0291 
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for  phoneme  recognition.  Our  goal  was  to  improve  the  performance  of  TDNN  by  increasing  the  amount  of  data 
supplied  to  the  network.  This  was  done  hy  including  the  LPC  coefficients  along  with  the  FFT  bin  energies.  We 
have  also  trained  the  network  with  utterances  having  variable  durations.  We  feel  that  it  is  an  important  aspect 
due  to  the  extreme  variability  in  speaking  rate  of  different  speakers  at  different  situations.  Also,  restricting 
the  utterauice  length  to  150ms  to  make  a  decision  about  a  vowel  may  limit  the  network’s  performsmce  because 
depending  on  speadcing  style  and  the  spoken  word  the  ISOms  portion  may  not  contain  all  the  key  information 
to  recognize  the  vowel.  We  have  also  trained  the  network  with  more  than  one  speaker  and  have  obtained  100% 
recognition  rate. 

The  rest  of  this  Section  is  organized  as  follows.  A  brief  introduction  to  the  TDNN  architecture  is  given 
in  Subsection  II.  Subsection  III  gives  a  description  of  the  data  used  for  the  simulation.  In  Subsection  IV,  the 
experiment  is  discussed  and  some  results  are  given  along  with  some  comparison  to  related  work.  Finally,  in 
Subsection  V  some  conclusions  and  future  direction  of  this  work  are  given. 

II.  Network  Architecture 

The  network  architecture  used  in  this  work  is  similar  to  that  used  by  Waibel  and  his  colleagues  [1-4, 8, 9]  for 
phoneme  recognition  of  Japanese  speech.  The  differences  lie  mainly  in  the  number  of  inputs  applied  to  the  network 
and  the  ability  to  accept  longer  and  variable  length  utterances.  A  brief  description  of  the  network  architecture 
follows. 

The  TDNN  is  basically  a  modified  version  of  the  standard  multi-layer  feed- forward  neural  network  [6].  The 
difference  between  the  two  networks  lies  in  the  way  the  layers  are  interconnected  and  in  the  incorporation  of  the 
time  delays.  The  input  to  the  network  is  divided  into  a  sequence  of  10ms  time  segments.  As  the  time  segments  are 
fed  to  the  network,  they  pass  through  a  series  of  two  time  delays  as  shown  in  Figure  1.  This  series  of  time  delays 
acts  as  a  time  window  passing  over  the  data  which  effectively  gives  the  network  30ms  of  data  at  any  one  time, 
as  shown  in  Figure  2.  This  entire  window  is  fully  forward  connected  to  a  hidden  layer  of  eight  units.  The  hidden 
units  are  then  combined  similarly  in  a  five  frame  window,  in  the  same  fashion  as  in  the  input  layer.  This  frame 
is  then  fully  forward  connected  to  the  output  layer.  This  then  gives  a  sequence  of  outputs  which  are  averaged 
together  for  a  final  classification  for  the  entire  utterance. 

III.  Data  Description 

The  database  used  for  this  work  was  the  TI  4&-word  speech  database.  The  input  to  the  network  consists  of 
28  values.  These  values  consist  of  16  spectral  coefficients,  12  auto-regressive  (AR)  coefficients  and  average  power. 
The  spectral  coefficients  are  found  by  taking  the  Fast  Fourier  TVansform  (FFT)  of  each  10ms  time  frame  of  data. 
This  data  is  then  combined,  as  shown  in  Table  1,  to  form  the  coefficients.  The  computation  of  the  AR  coefficients 
is  a  standard  procedure  in  digital  signal  processing  [7].  These  coefficients  are  the  linear  prediction  coefficients 
that  model  the  vocal  tract  as  an  all  pole  filter.  It  is  well  known  that  10- 12th  order  filter  (10-12  LPC  coefficients) 
is  sufficient  to  model  the  vocal  tract.  We  have  used  12  coefficients  in  our  simulations.  The  complete  set  of 
coefficients  is  normalized  (excluding  the  average  power)  such  that  the  average  value  for  each  set  of  coefficients  is 
zero  and  lies  within  the  range  of  -1  and  -1-1. 

One  key  difference  in  our  work  when  compared  to  the  ones  reported  in  [1-4]  is  the  utilization  of  complete 
utterances  to  make  the  recognition  decisions.  The  referenced  works  restrict  this  length  to  150ms  in  duration. 
We  feel  that  restricting  the  utterance  length  may  limit  the  performance  of  the  network.  This  is  motivated  by 
examining  the  utterance  length  statistics  for  the  data  used  in  this  experiment,  as  shown  in  Table  2.  From  the 
table  it  is  clear  that  there  are  considerable  variations  in  lengths  of  different  utterances.  It  may  also  be  observed 
that  almost  all  utterances  are  much  longer  than  the  150ms,  as  used  in  [l-4,8,9].  For  this  reason,  we  have  trained 
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and  tested  the  TDNN  with  complete  utterances  of  the  vowels. 

IV.  Experiment  and  Results 

In  our  experiments,  we  have  tested  the  TDNN  by  training  and  testing  it  to  recognize  the  long  vowel  sounds 
a,  e,  i,  o  eind  u.  The  data  consisted  of  10  training  and  16  testing  utterances  for  each  of  the  five  vowels  and  for 
each  of  the  2  female  speakers.  This  gives  a  total  of  100  training  and  160  testing  utterances. 

The  training  algorithm  used  was  a  modified  version  of  the  back  propagation  algorithm.  The  modifications 
were  made  in  order  to  adapt  the  algorithm  to  the  TDNN  architecture  [12].  One  other  modification  was  to  skip 
the  learning  cycle  if  the  output  error  was  less  than  1%  [2].  This  was  done  m  order  to  speed  up  the  training  time 
with  the  anticipation  that  if  the  network  has  satisfactorily  learned  the  particular  training  data  then  no  further 
training  will  be  necessary.  For  each  training  cycle,  the  forward  pass  of  the  network  computes  a  sequence  of  three 
outputs.  The  error  and  weight  adjustments  are  computed  for  each  of  the  three  outputs.  These  three  sets  of  weight 
adjustments  are  then  averaged  and  the  final  weight  adjustment  is  made  using  the  averaged  value.  By  treuning  the 
network  in  this  way,  it  is  taught  based  on  90ms  of  data  rather  than  70ms  with  very  little  increase  in  computation. 

With  the  network  architecture  as  described  above,  simulations  were  performed  in  an  attempt  to  recognize 
the  set  of  vowel  sounds.  The  simulation  was  written  in  FORTRAN  and  is  running  on  a  VAX  8550.  With  the  data 
described  above  we  were  able  to  train  the  network  with  100%  recognition  for  two  female  speakers.  This  result 
compares  well  with  that  of  Waibel  [3],  in  which  he  has  reported  a  recognition  rate  of  98.6%  for  a  single  speaker. 
Our  experiments  indicate  that  by  including  the  LPC  coefficients  and  by  using  the  total  utterance,  the  network 
was  able  to  correctly  learn  all  the  vowels  for  multiple  speakers. 

Two  different  approaches  were  tested  to  train  this  network  with  utterances  of  multiple  speaJcers.  First  the 
network  was  trained  with  only  one  speaker  and  100%  recognition  was  achieved  within  75  iterations  and  1:06:57.07 
(denotes  hr:min:sec)  of  CPU  time.  This  network  was  then  continued  to  be  trained  with  the  database  extended 
to  two  speakers.  This  resulted  in  100%  recognition  for  the  two  speakers  in  just  12  more  iterations  (0:27:02.33 
CPU  time).  Hence,  as  a  whole,  a  total  of  1:33:59.40  CPU  time  was  required  to  train  5  long  vowel  sounds  spoken 
by  two  speakers  for  100%  recognition  rate.  The  second  training  method  used  a  training  set  from  both  speakers 
from  the  very  beginning  of  the  training  cycle.  This  resulted  m  a  training  cycle  of  255  iterations  (9:40:01.50  CPU 
time).  Hence,  the  total  training  time  in  this  later  approach  was  more  than  6  times  greater  than  that  of  the  first 
method.  Another  interesting  observation  we  made  was  that  the  second  speaker  never  attained  100%  recognition 
from  its  database  alone  with  the  same  order  of  number  of  tr2uning  iterations  as  required  for  the  first  speaker. 
This  fact  demonstrates  two  facts,  1)  the  database  of  the  second  speaker  may  not  be  totally  representative  of  its 
testing  database  and  2)  the  network  does  generalize  on  the  data  that  it  learns.  This  is  evident  from  the  fact  that 
when  trained  with  the  combined  database  for  both  speakers,  the  network  was  able  to  learn  the  vowels  of  both 
speakers  rather  quickly  though  it  was  not  learning  the  second  speaker  when  it  was  being  trained  separately. 

V.  Conclusion 

This  work  shows  that  the  increased  amount  of  information  used  in  this  work  has  the  potential  to  increase 
the  recognition  capabilities.  At  the  present  time  we  are  training  the  network  on  an  extended  number  of  speakers. 
Although  it  is  making  progress,  the  learning  rate  is  slowed  considerably  with  the  larger  data  sets.  But  considering 
its  present  performance,  we  anticipate  that  these  results  may  be  improved  upon  in  the  future.  We  have  also 
shown  elsewhere  [11]  that  the  network  used  here  can  be  further  generalized  to  learn  different  phonemes  in  English 
and  that  groups  of  previously  trained  networks  can  be  combined  to  develop  very  large  networks.  Therefore  we 
conclude  by  stating  that  small  networks,  such  as  the  one  used  here,  may  be  trained  on  small  databases  with 
limited  recognition  abilities  and  may  later  be  combined  to  perform  larger  tasks. 
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SECTION  3.6  :  SPEECH  ANALYSIS  AND  SYNTHESIS  WITH  NON-LINEAR  PREDICTION 
SUMMARY 

One  of  the  common  assumptions  in  speech  has  been  that  speech  production  and  perception  are  essentially 
linear  processes  and  hence  one  can  accurately  model  speech  data  using  ‘linear  prediction’  based  methods  [1,  3]. 
Physiological  evidence  indicate  that  some  nonlinear  operation  does  occur  in  speech  production  and  perception 
[4,14,15].  It  is  also  known  that  linear  models  perform  poorly  for  certain  types  of  speech.  Recently  a  non-parametric 
nonlinear  chaos  model  showed  significant  improvement  over  the  usual  linear  models  [11]. 

In  this  Section  we  consider  the  applicability  of  certain  parametric  nonlinear  models  for  the  purpose  of  analy¬ 
sis/prediction/synthesis/coding  of  speech  signals.  To  the  best  of  our  knowledge,  these  models  have  not  yet  been 
exploited  for  speech  modeling.  Several  algorithms  for  simultaneous  estimation  of  the  nonlinear  as  well  as  the 
linear  prediction  parameters  of  speech  signals  are  being  considered.  Preliminary  studies  indicate  that  the  nonlin¬ 
ear  models  retain  substantially  more  information  when  compared  to  linear-only  models.  Preliminary  experiments 
on  telephone  quality  speech  data  clearly  and  consistently  indicate  that  there  is  a  significant  reduction  in  the 
prediction  error  when  the  bilinear  prediction  components  are  included  along  with  the  LPC  part.  The  results  of 
this  work  may  have  significant  effect  on  the  performance  accuracy  of  juay  speech  recognition/synthesis/coding 
system  that  currently  relies  on  linear  prediction  only. 

Linear  Prediction  :  In  linear  prediction,  the  speech  sample  at  a  time  instant  n  is  modeled  as  a  linear 
combination  of  past  p  speech  samples  as, 

p 

«(«)  =  ~  -  *)  +  c(n),  (1) 

i:=l 

where,  the  o*  ’s  are  the  Linear  Prediction  Coefficients  (LPC),  c(n)  denotes  the  prediction  error  and  the  first  term 
is  the  predicted  value  s(n), 

p 

s(n)  A  (2) 

The  LP  coefficients  are  found  by  minimizing  the  prediction  error  power  the  entire  data  length. 

This  results  in  a  set  of  linear  equations, 

Ra  =  r  (3) 

where  the  matrix  R  and  the  vector  r  contain  estimated  correlations  of  speech  data.  The  predictor  coefficients  are 
then  found  simply  by  inverting  the  R  matrix  and  forming, 

a  =  R-*r.  (4) 

Motivation  for  Nonlinear  Prediction  :  There  are  several  practical  rind  theoretical  grounds  behind 

the  assumption  of  linearity  even  though  there  is  physiological  evidence  that  nonlinearity  does  play  a  role  in 
speech  production  and  perception  [4,14,15]  and  modeling  [11].  Linear  assumption  makes  processing  relatively 
inexpensive  because  it  only  requires  the  solution  of  a  set  of  linear  equations  in  (3).  Also,  there  indeed  is  a  major 
contribution  from  the  linear  components  within  the  overall  nonlinear  structure.  Hence,  linear  prediction  provides 
a  practical  solution  with  reasonably  good  performance.  But  processing  speeds  on  digital  computers  have  increased 
dramatically  in  the  two  decades  following  the  introduction  of  linear  prediction  to  the  speech  community  [1].  Over 
the  last  decade  significant  advances  have  also  been  made  in  nonlinear  parameter  estimation  methods.  In  fact, 
there  already  exists  a  rich  mathematical  theory  on  nonlinear  modeling  as  well  as  nonlinear  system  identification 
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[2,5-10].  It  appears  that  the  potential  ofTered  by  these  techniques  are  yet  to  be  exploited  in  analyzing  and 
processing  speech  data  for  recognition,  coding  or  synthesis  purposes. 

It  may  be  noted  here  that  recently  Townshend  [11]  presented  a  nonparametric  and  nonlinear  prediction  model 
for  speech.  But  in  [11]  speech  was  modeled  as  deterministic  chaos  signal.  Here  we  model  speech  as  stochastic 
signal,  the  same  assumption  traditionally  made  in  linear  prediction. 

General  Non-linear  Case  :  Volterra  Series  :  In  general,  the  nonlinear  relationship  between  s(n)  and 
e(n)  is  given  by  a  Volterra  series  of  the  form  [12,  13], 

OO  00  00  oo  oo  oo 

s{n)  =  '^9kein- k)  +  fftac(n  - /)e(n  -  m)  -h  '^^Yi9i)ke{n  -  i)e{n  -  j)e{n  -  k)  +  ....  (5) 

i=0  ;=0  m=0  i=0>=0i:=0 

Note  that  there  is  no  auto- regression  of  the  output  in  this  model  and  hence  it  may  be  considered  a  general 
nonlinear  Moving-Average  (MA)  model.  Estimation  of  the  infinite  set  of  kernel  parameters  >00  it  ■  • .  appears 

computationeilly  difficult  because  the  kernels  gijt...  can  not  be  estimated  independently.  But  there  are  certain 
special  cases  when  one  can  estimate  these  separately  [6]  and  we  intend  to  analyze  if  those  special  assumptions 
are  applicable  to  speech  signals. 

Bilinear  Modeling  with  Finite  Number  of  Parameters  :  Equation  (5)  is  a  general  nonlinear  MA-type 
Volterra  model.  But  following  the  trends  in  the  linear  case,  it  seems  natural  to  expect  that  if  we  include  output 
regression  as  well,  i.e.,  if  we  employ  an  AR  or  ARMA-type  nonlinear  model,  we  may  obteun  equivalent  results 
even  with  finite  number  of  nonlinear  parameters.  In  fact  there  exists  one  such  model  called  ‘Bilinear  model’  which 
has  appeared  in  control  theory,  nonlinear  system  theory  and  time-series  analysis  literature  [2,  5-10].  The  general 
bilinear  predictor  is  expressed  as, 

p  1  m  k 

«(")  =  “  £  -  0  +  £  c,c(n  -  i)  + 

i=l  i  =  l 

Note  that  the  first  term  in  R.H.S.  is  the  AR  (or  LPC)  part,  the  second  term  is  the  MA  put  and  the  third  term 
is  the  bilinear  (multiplicative  ARMA)  part  2uid  it  is  denoted  as  BL(p,  q,  m,  k)  model.  The  finite  order  bilinear 
model  is  a  parsimonious  but  powerful  nonlinear  model.  In  fut,  it  has  been  shown  that  the  bilinear  model  with  a 
finite  parameter  set  can  approximate  any  ‘well  behaved’  infinite  Volterra  model  of  (5)  to  an  ubitrary  degree  of 
accuracy  [2]. 

The  parameter  estimation  methods  and  the  order  selection  methods  for  the  BL(p,  0,  m,  k)  case  have  been 
studied  by  Subba  Rao  in  [9,10].  The  idea  again  is  to  estimate  the  linear  as  well  as  the  nonlinear  coefficients 
simultaneously  by  minimizing  the  prediction  error  power  e^(n)  over  the  entire  data  length.  This  leads  to  a 

nonlinear  optimization  problem  and  Subba-Rao  has  proposed  a  repeated  residue  method  and  a  Newton-Raphson 
based  method. 

Aput  from  the  nonlinear  models  (5)  and  (6),  there  are  several  other  specialized  nonlinear  models  such  as 
Diagonal  Bilinear  model.  Subset  Bilinear  model  and  Threshold  AR  model  [9,  10].  These  models  may  or  may  not 
be  appropriate  for  speech  signals.  We  are  currently  investigating  these  models. 

Simulation  Results  :  In  order  to  compare  the  performance  accuracy  of  the  linear  and  bilinear  prediction 
methods  preliminary  tests  have  been  carried  out  using  the  word  ‘STOP’  from  the  Texas  Instruments  46- Word 
isolated  speech  database.  The  sampling  rate  is  12.5  kHz.  About  200  msec  of  speech  data  (Hamming  windowed 
and  4KHz  Lowpass  filtered)  was  sectioned  into  20  msec  blocks  with  overlaps  of  10  msec  between  adjacent  data 
blocks.  This  gave  us  20  data  blocks  to  perform  the  comparison  experiments.  Our  first  task  was  to  select  an 
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‘appropriate’  linear  predictor  model  order  (p)  which  will  be  effective  for  all  the  data  blocks.  The  optimum  order 
was  decided  according  to  the  MDL  (Minimum  Description  Length)  criterion  though  AIC  (Akaike  Information 
Criteria)  was  also  considered.  The  12th  order  AR  model  seemed  to  be  an  optimum  choice  for  many  of  the  blocks. 

To  compute  the  Bilinear  model  BL(12,0,m,  it)  parameters,  the  best  values  of  m  and  k  (with  m,k  <  5)  were 
determined  for  each  block  of  data.  The  parameter  estimates  were  found  for  each  m  and  k  first  by  the  repeated 
residue  method  and  then  using  the  Newton- Rap hson  technique  [9,  10,  6].  The  optimum  m  and  k  were  determined 
for  each  data  block  according  to  the  minimum  of  AIC.  The  iterative  methods  converged  within  5-20  iterations. 
It  was  found  consistently  for  each  data  block  that  the  prediction  errors  for  the  bilinear  case  were  around  3  dB 
lower  than  the  12th  order  AR  counterp^t. 

The  speech  data  for  a  part  of  the  word  ‘STOP’  is  shown  in  Fig.  la.  In  Fig.  lb  the  prediction  errors  with 
the  12th  order  AR  model  are  plotted.  Fig.  Ic  depicts  the  errors  in  the  case  of  the  Bilinear  model  BL(12, 0,  m,  k) 
with  optimum  m  and  k  (both  <  5).  Comparison  of  the  error  variances  for  several  blocks  for  the  12th  order 
AR  and  the  Bilinear  model  is  shown  in  Fig.  Id.  Equivalent  results  for  another  speech  sound  ‘NO’  is  shown  in 
Figures  2a-2d.  These  figures  clearly  show  that  the  bilinear  model  outperforms  the  linear  prediction  model  in 
all  cases.  Specifically,  about  3dB  improvement  in  prediction  was  observed  in  almost  all  cases.  Simil2u  results 
were  found  at  other  AR  model  orders  and  for  other  speech  signals  also.  Currently  we  are  exploring  the  effect 
of  nonlinear  modeling  in  speech  synthesis  and  coding.  Considering  the  improvement  in  prediction  that  we  have 
already  observed,  we  hope  to  see  equivalent  superior  performance  in  those  cases  also. 
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